Infrastructure as Code: Why it’s vital you build it right

The enabling idea of infrastructure as code is that the systems and devices which are used to run software can be treated as if they, themselves, are software.

Using technologies like cloud, virtualisation and configuration automation to manage IT infrastructure means that infrastructure is decoupled from the underlying hardware, turning it into data and code.

This transformation can be leveraged by bringing tools and practices from software development to infrastructure management. Many of the principles underpinning infrastructure as code will be familiar to technologists well versed in agile or XP, and due to the fundamental assertion that change is inevitable, finding new ways to embrace change should be front of mind.

Build the right thing, and build the thing right.

Good software engineering practices produce high quality code and systems. Quality is sometimes seen as a simple matter of functional correctness, however in reality, it is an enabler of change. The best way to measure the quality of a system and its code is to see how quickly and safely changes can be made to it. Poor quality systems are difficult to change, and often what you think is a simple change may require pulling apart large sections of code, sometimes creating even more of a mess.

The same is often true with infrastructure, even without automation. Different people have built, changed, updated, optimised, and fixed various parts of the systems over time, meaning the whole interrelated web of parts can be precarious, with any change to one part having the potential to break one or more others.

Defining system infrastructure as code and building it with tools doesn’t automatically make the quality better, and at worst could result in fragile infrastructure running the wrong tool, with catastrophic effects. Infrastructure as code shifts the focus of quality to the definitions and tooling systems, which is essential to structure and manage automation so that it has the virtues of quality code - easy to understand, simple to change, with fast feedback on problems.

Over time, a codebase grows and can become difficult to maintain. The same thing happens with infrastructure definitions, meaning many of the same principles and practices can be used to make maintaining large infrastructure codebases easier.

Clean code

Recently, there has been a renewed focus on “clean code” and software craftsmanship, which is as relevant to infrastructure development as it is to application development. Many people see a tension between pragmatism - getting things done - and engineering quality - building things right. This is a false dichotomy.

Craftsmanship is about making sure that what you build works in the correct way. It means building systems that another professional can quickly and easily understand. When you make a change to a cleanly built system, you should be confident that you understand what parts of the system that change will affect. Clean code and software craftsmanship should not lead to over-engineering.

If you only build what you need, it becomes easier to make sure what you have built is correct.

Manage technical debt

“Technical debt” is a metaphor for problems in our system that we leave unfixed, and as with financial debt, your system charges interest. Software craftsmanship is largely about avoiding technical debt, which can be done by making a habit of fixing problems and flaws as you discover them, preferably as you make them, rather than falling into the bad habit of thinking “it’s good enough for now.”

Some people dislike the term “technical debt” as a metaphor for poorly implemented systems, because it implies a deliberate, responsible decision, similar to that of borrowing money to start a business. However, it’s worth considering that there are many different types of debt, and quickly knocking out code that will be difficult to change and maintain is like taking a payday loan to pay for a vacation - it runs a serious risk of bankrupting you.

Fast feedback

A cornerstone of high quality systems is fast feedback on changes. When I make a mistake in a change to a configuration definition, I’d like to find out about that mistake as quickly as possible; the shorter the loop between making a change and being notified that it presents a problem, the easier it is to find the cause.

Introducing Continuous Integration

These practices all come together with Continuous Integration (CI) which is the process of frequently integrating and testing all changes to a system as they are being developed. CI tools, including Bamboo, Jenkins, GoCD, SnapCI, TeamCity and TravisCI, can be used to enable this practice, but it’s important to note that Continuous Integration is not the practice of using a CI tool, but the practice of frequently integrating all changes.

With Continuous Integration, all developers on a team commit their changes to the trunk of the codebase. Every time a commit is made, the CI tool builds the codebase and runs an automated test suite. The benefit of this approach is fast feedback when a change doesn’t build correctly, or causes a test to fail, meaning it is immediately clear which set of changes caused the issue.

Every failed build or test in the CI system needs to be addressed immediately, as ignoring it allows errors to pile up, and becomes difficult to untangle later. A failed run in the CI tool triggers a “stop the line” situation, known by development teams as a broken or red build. It’s important to note nobody else in the team should commit any changes until the error is fixed.

CI for infrastructure

For infrastructure as code, CI can be used to continuously test changes made to definition files, scripts, and other tooling, as well as configuration written and maintained for running the infrastructure.

Each of these should be managed in a version control system (VCS) and teams should avoid branching, in order to avoid building up an increasing “debt” of code that will need to be merged and tested.

Each commit should trigger some level of testing, and you should put everything in a VCS that is needed to build and rebuild elements of your infrastructure. Ideally, if your entire infrastructure were to disappear, other than the contents of version control, you should be able to check everything out and run a few commands to rebuild everything back to how it was, pulling in backup data files as needed.

Some examples of things to version:

  • Scripts and source code for compiled utilities and applications
  • Configuration files and templates
  • Configuration definitions (Cookbooks, Manifests, Playbooks, etc.)
  • Test code

Back to building the right thing, and building it right

The underlying theme of these practices is quality. Teams who prioritise the quality of their systems, by getting continuous feedback and acting on it immediately, create a virtuous cycle.

They have the confidence to routinely make the small fixes and tweaks that keep their systems humming smoothly, therefore giving them more time to spend on the more valuable, higher order work rather than fighting fires that could have been prevented.

Kief Morris, ThoughtWorker and author of Infrastructure as Code

Image source: Shutterstock/McIek