With the volume and variety of critical IT events growing every day, shortening incident response time is the new business imperative. After all, the longer a company takes to address an IT issue, the more severe the impact on the organisation, its customers and its users, among others. In other words, it does not matter that businesses are able to identify a ransomware attack in seconds if it then takes hours to assemble a response team, respond appropriately and restore service. With data breaches and service disruptions, time is literally money. According to the Ponemon Institute, companies across all industries lost an average of $8,900 a minute to downtime, a staggering price tag that doesn't even include the damage to employee productivity, customer satisfaction, brand reputation, revenue generation or other key areas.
How then can enterprises implement effective communications strategies among their key IT and security stakeholders to not only shorten their company’s time to respond to a major incident, but also improve and build efficiencies within their organisation?
Critical IT issues happen all the time from several sources. A 2018 IDG survey found that more than half (56 per cent) of the survey respondents reported having had one to five such incidents in the previous 12 months. Furthermore, 85 per cent of respondents had at least one major IT incident in the last year. The same survey found that most critical IT incidents originate from service operations – such as hardware failure, application latency and service outages – and security operations, like cyberattacks. DevOps comes in third position at 43 per cent.
Early issue detection to minimise the impact
We all know the motto of APM: detect IT performance issues even before end-users feel the pain. When things go wrong with IT, the first thing that needs to happen is to make IT staff aware of the issue. Companies most often do a good job at this as 71 per cent have established a dedicated response team for major incidents, and have implemented solutions that allow for failure detection, performance, monitoring, event correlation, IT operations management, security information and event management, application performance management, or ITSM and ticketing systems.
However, these teams often lack effective tools that automate the incident response process. Most companies manually handle tasks that will engage the response team, such as determining the best people to contact based on the type of issue or location, sending alerts to these people and setting up a conference call. The IDG survey found that companies on average need 25 to 39 minutes just to conduct these initial tasks in the incident response process and assemble a team. If we apply this to Ponemon’s $8,900 per minute figure, this means the incident will have already cost $222,5000 to $347,100 before the response team even has a chance to meet. To mitigate the risks (and associated costs) of downtime, an incident response team must decrease the time it takes to engage the team, as well as streamlining communication with non-IT staff, managers and other stakeholders at key moments. This is the key to resolving the incident as soon as possible.
Proactive communication means fewer tickets to deal with
Think about it – if an IT organisation can proactively communicate with all the potentially impacted business users of the disrupted service, it’s most likely that customers won’t need to call or open a ticket at the service desk. As a result, this will reduce the number of inbound tickets. In most cases, by a sizeable amount. One Boston-based cloud hosting company reported they are saving around a million dollars every year by avoiding incremental service desk costs related to major IT incidents. During these incidents, it is proven that the impacted business users usually only want one thing: to make sure that IT is aware of the pain they are experiencing and that they are doing something about it.
How does IT do this? They automate the process of identifying the groups that may be impacted by a disruption and notify these affected customers and end-users. The notification might look like this:
“The <xyz system> is not available at the moment. The IT team is aware and working on resolving the issue. We’ll keep you updated and will let you know when restored. Sorry for the inconvenience...“
This message can be shared simultaneously in many ways via different channels (intranet, emails, SMS, voice, etc.). As you can imagine, the more targeted the notification, the higher the impact on reducing the number of inbound calls into the service desk. That way, IT doesn’t waste any time compiling name lists based on who’s using the service/app or location. With automation, pre-approved message templates can very easily be used, so the IT team doesn’t even need to write a message from scratch and get the message validated each time an incident occurs.
With a significant reduction in the number of tickets that would have been opened with the Service Desk, IT can also expect a greater customer satisfaction, even during major incidents.
Fast triage and IT resolver engagement means faster resolution
Once IT knows about a severe enough issue, it needs to get the right triage team and IT resolvers on deck immediately, wherever they might be and whenever that might be – during the day or in the middle of the night. A recent survey from Fintech Futures shows that each organisation is unique in the way they deal with this activity. For more than half of businesses it takes more than an hour to assemble the right cross-functional response team.
Lean IT principles will tell you this is a waste in the process. It does not produce value and actually can make the situation even worse. At this point in time, more than an hour after a major incident has been detected, the IT team still hasn’t started investigating the issue and therefore, hasn’t identified the root cause.
It’s no wonder why coordinating incident response across the organisation is the biggest challenge for most enterprises.
Is IT Response Automation something you’ve thought about?
Digging deeper into why it appears to be so hard for IT organisations to engage the response teams in a timely manner, it’s interesting to see that most of the underlying activities are still performed manually. The aforementioned IDG research shows that businesses realise they could do better, but only about 40 per cent of them have engaged in some level of process automation.
It’s now proven that automating major incident communications, escalation and collaboration can engage an IT response team in five minutes or less – even when team members are spread across multiple sites and time zones – translating into hundreds of thousands of dollars in cost avoidance and a significant consistent reduction in the meantime to resolution. Revisiting the Ponemon Institute’s estimated average time to engage a response team, automation enables an IT team to respond to IT events four times as fast and save an estimated $133,500 to $222,500 per incident.
While automation may be seen as low-hanging fruit, it can have the most significant positive impact on IT business performance. Business demands for IT services are only going to increase, so major IT incidents are going to be more severe when they occur. Automated incident response is the only realistic answer to cope with the pace at which IT incidents need to be resolved in order to minimise their impact on business.
Vincent Geffray, Senior Director of Product Marketing, IT Alerting and IoT, Everbridge
Image source: Shutterstock/Ai825