Server downtime – it happens. Most large companies have at least once experienced the frustration that is having their servers down, whether it’s for a few minutes, hours or even days. It’s regarded as a part of life, something that happens that you can’t quite prevent, but in fact, server downtime costs companies millions of dollars yearly, either in lost business or in the distraction and psychological toll it takes on employees who have to deal with the problem. But actually, these can be prevented by avoiding common server issues, implementing agile working practices and creating a plan should emergencies happen.
The true cost of server downtime depends on the type of business you are running, and the revenues you typically receive from your website on a daily basis. Even a seemingly small amount of downtime can really hurt your business. Think of it this way: if your website experiences 99 per cent uptime in a given year, that translates to 3.65 days per year where your business is losing revenue.
Small businesses are usually hardest hit by server downtime: due to the overhead costs, SMEs tend to avoid building server redundancies which can minimise downtime. This can have a huge impact on profits for a business which might be operating on thin margins– but once again, it is difficult to quantify.
Larger businesses can afford more, and can thus invest in insuring against server downtime with multiple layers of redundancy. The cost of potential downtime is much larger for them than it would be for another business, so most prefer to have server redundancies in place.
What are the common causes?
Servers are often struck down by the simplest issues. One of the more frequent problems is running out of disk space, which we have seen affect a surprising number of our customers and has potentially great costs. Applications running on servers with low disk space will crash, misbehave, and freeze, losing your company’s employees time and effort which could have otherwise been avoided.
In terms of finance, companies can rack up high cloud server costs, particularly if they are on a pay as you go basis. Network fees may be high due to inefficient page design stemming from seemingly simple problems or large images or heavy documents, through to poor application design e.g. too verbose protocols or database replication.
Servers and software can also be hit by bugs or malware. Regardless of what type of website your business runs, starting with the most frequent and common causes of downtime is always the best approach. No one is immune to simple mistakes.
It’s wrong to just assess the impact of server downtime in terms of the financial value of lost business: downtime, and the on-call work schedules required to be ready for it, takes its toll on the mental and physical health of your IT staff. Recent studies by Tel Aviv University suggest interrupted sleep can be worse than no sleep. When monitoring servers, workers are expected to work at odd hours and never know if they will need to react quickly to a stressful emergency. This stress can be reduced by automating and prioritising the alerts sent out by servers.
For many IT workers, the primary issue is the sheer quantity of alerts that they receive, making it difficult to prioritise what to work on at any given time. These alerts are constant interruptions, and constantly switching tasks takes an often underestimated toll on workers’ concentration. Our research suggests it takes 23 minutes to regain concentration after being interrupted - these minutes add up to hours and days of lost time over the course of a year.
Just in the month of December 2016 our customers received 1.5 million individual alerts; clearly there needs to be some way of triaging these distractions so that the most important work is the work that gets done.
With the advent of cloud systems comes adaptation and change, however many companies have simply moved legacy systems into their new cloud environment. This isn’t enough to get all the benefits linked to hosting – fast product iteration, flexibility, and scalable capacity. If you adopt a traditional mindset to the cloud system, it is likely you’ll incur higher costs, increased frustration, and loss of time. The right way of adapting to new cloud systems is to do it through automation and APIs.
Being an agile team is necessary nowadays and in infrastructure management, it occurs by implementing devops principles. Development teams can and should be able to run their own systems, while operations teams maintain the underlying platform structure. Developers should request, manage and monitor resources, as well as being responsible for the availability of systems and the maintenance of uptime.
With cloud deployments, developers can now deploy SaaS products to handle things like storage, queuing, monitoring and email delivery, rather than having to build them. This opens their time up to build the unique aspects of their product, leaving the undifferentiated heavy lifting to the SaaS vendors.
Get a plan together
Even if you follow the advice above, it is likely that your systems will inevitably suffer from downtime at some point. While it may seem obvious, it is crucial to a quick recovery to have a strong incident plan. As a company, you should be making regular backups and running tests regularly on these backups. It’s also useful to have a checklisted plan of what needs to happen, and who to contact should an emergency develop.
Server downtime is a lot more expensive than people make it out to be. With the technology available, and simple common sense, it should be easy to lower the human and monetary costs of downtime. Common server errors can be avoided through a combination of standard agile working practices, basic server health checks for hard disk space and other key metrics, and a strong downtime recovery plan. With these three elements in place, server downtime should not be something your business needs to worry about.
David Mytton, founder, CEO, Server Density
Image Credit: Scanrail1 / Shutterstock