Skip to main content

How IT infrastructure monitoring enlightens enterprises to help meet SLAs

(Image credit: Image Credit: Bbernard / Shutterstock)

Recent research reveals that nearly 90% of enterprises are failing to meet business-critical SLAs for mean time to resolution of infrastructure issues, the cause of nearly half of all application problems. This article lifts the lid on this new research and explains why applications should be at the centre of IT strategy, why proper infrastructure monitoring tools are a necessity to meeting Service Level Agreements (SLA) requirements, not a ‘nice to have’ – and why enterprises failing to do so are gambling with their ability to deliver services to customers. 

A wing and a prayer

It’s 9am and unbeknown to the IT staff of a well-known UK bank, the marketing team is about to launch a major new credit card promotional campaign. It’s a peak time when a large majority of banking customers typically log on for banking transactions. By 9.30am the systems are struggling, with the infrastructure experiencing latency issues because of the extra workload competing for system resources. It has begun to impact critical applications and customers’ banking experiences. By 9.45am the IT team is still scratching their heads, desperately trying to pinpoint the root cause. 

The Application Performance Monitoring tool is saying there is a slowdown in service but it’s not in the coding or areas it is monitoring. The plethora of infrastructure monitoring tools in place is only receiving data on their own specific siloed systems, which all appear to be within acceptable limits. The IT team knows there is a problem but is helpless to root it out in time. By 10am the business has experienced intermittent outages of over two hours and social media channels are a-buzz with customers reporting issues. By the next day, news of the debacle is all over the press. The result: damage to business, brand reputation, customer loyalty and loss of revenue.

This is a true story - and sadly, one of many. In the scenario above, the problem was caused by a knock-on effect commonly known as ‘noisy neighbour’ syndrome. Within the complexity of today’s shared data center infrastructure, the impact of multiple applications contending for the same infrastructure system components affects another application’s ability to service requirements.  This is a real threat to the business functioning properly. Unfortunately, this is just one example of a myriad of problems capable of causing a business to grind to a halt.

The scaling of today’s highly complex, virtualised data centre only further exacerbates stresses placed on the environment. Continuing to attempt to manage these multi-vendor, hybrid environments with traditional legacy silo-centric monitoring tools is not only becoming an impossible task, it is resulting in missed SLAs. Enterprise vulnerability to this ticking time bomb can be evidenced in the recent global survey carried out by Dimensional Research. IT professionals and executives worldwide were surveyed on experiences of application performance outages and slowdowns, IT tools utilised and issue resolution. The primary goal was to understand the frequency of application performance issues and outages, and their causes, for large enterprises. The global research also focused on IT tools used to monitor and remediate application issues.   

One of the main takeaways from the research is that assuring the performance and availability of business-critical applications is something that enterprises are struggling to guarantee. It highlights serious monitoring challenges and the detrimental impact these are having on the business. What is particularly apparent is the staggering frequency of incidents incurred, with almost 2/3 of respondents (61%) reporting that they experience four or more significant application outages and/or slowdowns every year. 

Inadequate monitoring

Due to increasing complexity within data centres, the issues enterprises are facing result from a lack of infrastructure visibility. The majority of monitoring tools lack the capability to enable IT teams, infrastructure managers, application owners, storage teams and executives to gain important insights into the health and utilisation of the entire infrastructure. This is reflected in the survey feedback, which points to the inadequacy of monitoring tools, with nearly 3/4 of respondents (71%) using more than five IT infrastructure monitoring tools, but more than half of respondents (54%) relayed that they lack full application and infrastructure visibility. It is precisely this ‘blind spot’ issue that results in chain reactions putting organisations at risk of application slowdowns and outages. 

Hunting for the root cause: infrastructure blindness

The research also exposed that 59% of respondents admitted that their application performance issues are related to the underlying infrastructure. Because IT teams have no real visibility of this layer, they typically go into ‘reactive mode’, rather than being pro-active and able to predict and avoid potential issues before they impact service. Many enterprises have no common understanding of how applications relate to the underlying infrastructure supporting and delivering them. This is because many enterprises mistakenly view their applications and infrastructure as two disparate entities, so there’s no end-to end, down the stack visibility of the entire environment, the result is a vicious cycle of availability and application performance issues. 

Delayed resolution

On the topic of issue resolution, 3/4 of respondents confirmed that they are unable to consistently provide application issue identification within 24 hours. With this scenario, finger pointing often ensues, with enormous stress placed on the shoulders of IT teams who are ill equipped to manage the inevitable performance and availability fluctuations of the infrastructure, therefore unable do their jobs properly. This almost certainly has a detrimental impact on job motivation of the staff involved, who may be trying their best to resolve the issue as quickly as possible, but without the necessary applications and infrastructure diagnostics. This is reflected in the research which maintains that almost 2/3 of respondents often feel that they’re held personally responsible for application outages and slowdowns. With the increase in the number of applications, 65% reported that they are concerned about the perceived value of the internal IT infrastructure team to the business. 

More than half of respondents (51%) revealed that there is no collaborative approach (between operations, application and engineering teams), in evaluating application performance requirements. Without this interaction between teams, there is a lack of shared information and communication on exactly how applications may impact the infrastructure at any given time, leaving the teams and the business in an unpredictable and precarious position.

In addition there is currently little being done by way of scenario planning for potential conflicts on resources, or testing carried out on the impact of new technology on the systems. According to the survey, enterprises are failing to test or simulate projected application growth for performance capacity planning, with over half of respondents (51%), confirming this is the case. Without sufficient load testing insights from the perspective of the application, how can corporations ever get a handle on the tipping point of application growth that could result in latency spikes or sustained slowdowns?

Missed SLAs

Nearly 9/10 of respondents (89%) revealed they are unable to consistently meet their SLAs for mean time to resolution (MTTR) of these issues, with 3 out of 4 companies failing to consistently identify root cause within 24-hours, thereby exposing their organisations to substantial risk. 

Customer dissatisfaction

Almost 4/5 of respondents (79%) confirmed that these application outages and slowdowns directly affect customers. Unfortunately the outlook doesn’t look much better, as nearly 2/3 of respondents (62%) claimed they lack confidence that their current IT infrastructures will be able to meet their organisation’s application performance needs over the next two years. With an application-focused approach to the infrastructure, organisations are able to get ahead of the game by avoiding interruptions to service and anticipating customer needs.

Applications and infrastructure alignment

As a starting point, applications should be placed at the heart of IT strategy, as they are critical to running the business and are entirely dependent upon the health of the infrastructure. Enterprises should deploy cross-domain monitoring tools and a best-practice approach that views the infrastructure within the context of the application.

With the appropriate monitoring tools, IT, application and operations personnel can gain a deeper understanding of the impact of their applications on the underlying infrastructure. Without it, they are operating in the dark, unarmed with the insights required to avoid business-impacting slowdowns or outages - and unaware of the potential crisis that could so easily befall them. 

As concerning as the situation is, CIOs and operations teams should view these insights in a positive light. With an acknowledgment of the need for the holistic infrastructure visibility they’ve been missing and by taking an application-centric approach when it comes to implementing an IT infrastructure monitoring platform, enterprises will be in a position to guarantee SLAs, eradicate downtime and improve business success.

Sean O’Donnell, Managing Director EMEA at Virtual Instruments

Image Credit: Bbernard / Shutterstock

Sean O’Donnell
Sean O’Donnell is Virtual Instruments’ EMEA managing director. Virtual Instruments is the industry’s first application-centric infrastructure performance management provider for the hybrid data centre. Its vendor-independent solutions deliver a unified real-time view of infrastructure performance in service of enterprise applications, whether they are deployed on-premises or in the cloud.