Thus, good-intentioned and well thought out policies on development within an organisation often get ignored in the fast-moving cloud era, risking the company's overall resilience and even security. Companies can't afford to hold up work while an inspector checks out each developer's work, and developers can't hold up their own work to ensure compliance with company policy each time they make a change to code, lest they fall behind schedule.
It's a brave new world in the tech business these days. Super-specialisation in skill sets, the difficulty in finding qualified personnel, and a nearly-unprecedented tight job market have all conspired to a hiring blitz in the DevOps space, as cloud computing surges.
However, many of these new DevOps workers are given minimum training; they already have the skills, and the work needed to get done yesterday. How do organisations ensure the integrity of their code? How do they ensure that errors have not seeped into the process because someone was not following best practices and that all teams are aligned?
R&D teams will be given general direction and framework on how to manage their development and environments, with each team responsible for a specific application or service. They are guided by the DevOps and SRE groups, which leads the process with a set of policies and recommendations on coding and configurations related to security, compliance, availability, performance, cost optimisation, collaboration, etc.
This is especially true in modern microservice architecture environments, where disparate elements have to “live in peace,” allowing coders to use the various services they need to accomplish their goals. The job of DevOps and SRE teams is to ensure that all users of the services are able to do their work without tripping over themselves or others.
But, things move very fast in our cloud-based world, specifically moving forward with microservices architecture. Code that was written in the morning can be uploaded to cloud repositories immediately, and a second team that is working on another aspect of the project may be caught by surprise. But they have to keep pace, and so they make their additions and adjustments.
How robust is your checkpoint testing?
As a result, coders – especially the experienced ones – may ignore best practices. While DevOps determines the overall policy that everyone should follow, the responsibility to actually follow and implement it is with the low level developer that may have only recently joined the team. In the rush to keep pace, coders may cut corners, and even small errors can cause major problems down the line, requiring many hours of detection, remediation and recoding to correct the issues.
To compensate for those errors, organisations set up checkpoints that inspect code at various stages of development, testing the code for integrity on its own, or for errors in the context of a larger application. Tests include QA sanity, integration, performance, security, resiliency and others.
The question for organisations, then, is how robust their checkpoint testing is – and in many organisations, the answer is “not very.” Many don't test for issues that could be direct results of mistakes made by staff - for example, code that was developed and deployed in a staging environment might fail when subject to a heavy CPU load due to a failure of staff to compensate for that. Sometimes applications fail when they interact with other applications or systems, or when another process begins running.
DevOps and continuous delivery practices are widely used today. Organisations that are more agile and faster to market can better respond to changing business needs. Development, QA, and operations teams face the challenge of incorporating validation into the product lifecycle without slowing down the process. Proactive resilience validation eliminates risks early and reduces costs, as opposed to fixing flaws in production. Organisations need to incorporate validation into development and deployment processes without sacrificing agility. With checks and analysis built into the continuous deployment pipeline, DevOps can find and fix vulnerabilities early and accelerate an organisation’s time-to-market. To accomplish this, four elements are needed when implementing checkpoint testing:
Compensating for the lack of experience
Automation: With the workloads faced by teams, the large amount of code that has to be examined, and the hectic pace of work, it's impossible to expect that a manual examination of code is going to solve the problem. Without automation, organisations have no chance at managing proactive resilience validation. Challenges to resilience, resulting from technology, must be resolved by technology. A basic requirement for code resolution is the automation of checkpoints that can examine and test all the disparate elements in a system.
Robustness: So you have an automated checkpoint testing system – but is it testing for the right elements, or for all elements? Many of the issues in code are not necessarily visible to basic automated QA systems. As mentioned, the issues of CPU load and disparate elements in an operating environment are often not checked out. A good automated checkpoint system will examine those elements and issues.
Knowledge: And to do that, the system must be knowledgeable enough about the issues it is checking. Systems need to programmed to test for the various permutations that code could take, the environments and issues it could be subjected to. Examining the code from this perspective will minimise the problems that can crop up. And, reporting those issues to DevOps staff will help educate them on what best practices need to be adopted in order to ensure higher-quality work, which will ensure better and more efficient results.
Proactive validation: To reduce the risk level, organisations should integrate tests as part of the deployment process. This integration is done with CI/CD tools such as Jenkins, Concourse and Codefresh. During the deployment process, when the staging environment is deployed and integration tests are conducted, organisations need to validate the absence of resiliency and security risks in the environment. This can be accomplished by integrating automation tests that analyse the environment, finds all the risks, and calculate a risk score for this deployment. Based on the score, teams can decide to proceed with the deployment or to retool.
Thus, a robust, automated system of checkpoints can help compensate for the lack of experience of newer team members, and up the game of experienced members. The system not only compensates for mistakes and errors – but it also helps educate team members on what they should be doing, and what they should be avoiding, to produce better code. It's a win-win for them, and for the organisation.
Avi Aharon, Vice President, Head of Cloud Business, Continuity Software