Skip to main content

How to reduce the impact of IT outages

(Image credit: Image source: Shutterstock/hafakot)

Digital transformation is well underway in many organisations. IDC forecasts that businesses worldwide will spend nearly $2 trillion on digitalisation projects by 2022. As the adoption of new technologies continues at pace, more and more customer touchpoints are becoming digital. And as such, availability is increasingly mission-critical. Customers expect a seamless, efficient and reliable experience on all channels, at all times.

However, keeping services running smoothly is not always an easy task, as the massive 2019 IT outages experienced by British Airways, Target, Facebook and Twitter showed. A recent IT Outage Impact Study commissioned by LogicMonitor, the leading provider of unified IT infrastructure monitoring, examined just what it takes to keep the lights on and services running. 

The independent survey of 300 IT decision makers in the UK, US, Canada, Australia and New Zealand investigated how organisations approach the task of detecting, mitigating and preventing IT issues. It brought to light a reality that’s in stark contrast to the image of the brave new digital world: Businesses are concerned about their ability to avoid costly outages, mitigate downtime and reliably provide the availability that customers and partners demand.

How to combat costly downtime

Overall, survey respondents agreed that availability and performance were their top priorities, ahead of security and cost. But although IT teams seemed to be intensely focused on keeping their networks running at peak performance, they were still not able to prevent downtime. 96 per cent of surveyed businesses admitted that they had experienced at least one IT outage in the past three years.

Surprisingly, respondents also reported that more than half of downtime could have been prevented. The most common causes of downtime uncovered were network and infrastructure failures, human error, surges in usage and software malfunction – some of which could have been detected and remediated before they affected service quality.

When asked how confident they were in their ability to prevent future outages, IT decision makers had a pessimistic outlook, with more than half (53 per cent) expecting to experience a brownout (defined as a period of dramatically reduced or slowed service) or outage so severe that it would make national media headlines. The same proportion of respondents were worried that someone within their organisation could lose his or her job as a result of a severe outage.

Negative press coverage, damage to an organisation’s reputation and possibly severe career implications aside, downtime is also costly for organisations. A drop in productivity, lost revenue and compliance-related costs were all cited by the survey respondents as negative consequences associated with both IT outages and brownouts. These costs can add up quickly. On average, organisations with recurrent outages and brownouts experienced 16 times higher costs associated with mitigating downtime than organisations with fewer outages. In addition, troubleshooting for organisations with frequent outages took nearly twice the number of team members and double the time.

How to improve availability

If more than half of outages and brownouts are avoidable, according to LogicMonitor’s global IT survey, then every business should be working proactively to prevent any disruptions. Yet, even the most highly skilled IT professionals seem to be unsure of how to tackle the task. Careful planning, a team that is well prepared, and comprehensive monitoring software all go a long way in helping organisations minimise downtime.

Here are some steps every organisation can take to prevent outages and avoid downtime:

  • Implement comprehensive IT monitoring. Many organisations run a hybrid IT environment that combines infrastructure both on-premises and in the cloud. Using separate monitoring tools for each platform is not only inefficient, but also prone to error. Instead, businesses should choose a solution that spans their entire infrastructure landscape and lets the team monitor IT systems through a single pane of glass. To ensure the solution integrates with present as well as future technologies, select a platform that can scale and expand.
  • Use a monitoring solution that gives the team early visibility into trends that indicate there is trouble brewing. Data forecasting is a useful tool; it allows organisations to identify impending failures and proactively prevent issues before they impact the business. Early alerts enable teams to fix single points of failure that might cause a system to go down. An additional way of preventing downtime is to build a high level of redundancy into the monitoring platform.
  • Make sure you have a detailed response plan for IT outages. Define responsibilities as well as processes on who to involve and when. This emergency plan may never be needed, but it’s imperative to have clear procedures for managing an outage, from escalation and remediation to communication and root cause analysis.

Alongside data, availability has become the most valuable commodity of the digital age. No one is immune to IT failures – but those that take the right preventative measures can ultimately greatly reduce the impact of downtime on their organisations.

Mark Banfield, Chief Revenue Officer, LogicMonitor