There was a time in the not-too-distant past when every IT team could map a clearly defined network path between the enterprise and its data centre. They were able to control and regulate the applications that ran on internal systems because they were the ones responsible for hosting the data on their own servers, and their users typically worked within an office environment where they used client-based software without a need to access a cloud. Back then, the task of monitoring network performance and troubleshooting latency was far easier. It should come as no surprise, then, that the proliferation of SaaS applications and cloud services and storage options over the past decade means that pinpointing enterprise network issues has become a big headache again – and another exercise in finger-pointing.
Whenever an enterprise outsources data or applications to an external resource in the cloud, it’s effectively adding a third party into the mix of network variables, which in turn introduces another potential weak point that may impact network performance. Thankfully, these services normally work reliably, but you can never overlook the fact that the app or data is generally connected via the internet rather than a dedicated line, meaning that there are many parts of the puzzle that the IT team no longer has much control over. To get around this problem, a lot of companies rely on expensive, tunneled MPLS while others even pay for multiple vendors as a failsafe. Still others use tools at the firewall that can intelligently choose which connection is performing best, so that traffic can be prioritised to travel over that connection.
As mentioned above, one solution is to lease a dedicated tunnel. Many larger companies do this to ensure the operation of their data centre connection out to the internet or into their headquarters. Leasing these tunnels is quite a financial burden, so most enterprises can’t afford to do it for multiple remote offices or branch locations. This means that although data going into the HQ can be monitored and controlled, the data from Salesforce, Slack, Office 365, WebEx, Citrix and other SaaS applications is no longer being funnelled through the enterprise data centre, because the data isn’t hosted there to begin with. The office itself may have quite a robust connection, but many people may be working from home using a Comcast or AT&T connection to access their data or applications hosted on services that can include AWS, Azure or Google.
As you can imagine, with all of this added complexity involved in network traffic delivery today, it’s not surprising that latency is becoming more common and increasingly difficult to troubleshoot and stamp out. So, is there a solution?
Network Performance Monitoring at multiple locations
One thing enterprise IT teams should be doing more is monitoring within the data centre and at the connection to the internet, as well as using tools to monitor the connection at all of their remote locations. It makes sense because remote offices are where so many of the connections take place. In the past, engineers could use Netflow or DPI (Deep Packet Inspection) at the data centre, knowing that most of the remote traffic hit their servers, and went back out to their clients. Everything has become so much more distributed with only a limited amount of remote data hitting the servers that monitoring accurately is far more difficult today from the DC. It’s quite a shift in thinking compared with today’s practices, but it makes sense: if the data is distributed, then network monitoring needs to be, too.
Office 365 is a good example, because most of us use applications like Outlook, Word and Excel on a regular basis. As many organisations have shifted to Office 365 online, these applications connect to Azure rather than the data centre, so if the IT team doesn’t actively monitor the branch office, then they completely lose sight of the user experience at that location.
Another major problem when trying to deal with latency is that our service providers often don’t give us enough information. Within any enterprise, if productivity takes a nosedive due to technical issues, that can quickly become very expensive, but at the moment, I don’t think SaaS vendors or infrastructure providers give users enough visibility into outages or performance issues. They should do more to give customers accurate, up-to-the-second reporting about performance and issues. This is a critical – but often overlooked – part of true network visibility.
Latency in the cloud
As I’ve mentioned, network performance can be impacted by so many variables. Troubleshooting within the enterprise is relatively easy if compared with today’s cloud and hybrid environments. When using cloud-based storage or SaaS vendors, your data naturally go outside the controlled walls of your corporate environment. The data can be affected by jitter, trace route and even compute speed, meaning that latency is a very serious possibility. For that reason, one way of minimising impact is to use cloud-based storage and SaaS vendors with physical locations close to where the data will be used. If you are based in London but need to pull data from Singapore, there will be some latency caused as the data goes through every switch and router along the way. Even in a parallel process, you may have thousands or millions of connections trying to get through, so your packets may be forced to wait in a queue for a tiny amount of time. Every microsecond they’re in that queue is a microsecond of delay, which starts mounting up over longer distances.
A moving target
One of the first and most obvious casualties of latency is VoIP call quality, characterised by frustratingly delayed conversations. As more companies adopt VoIP and other UCaaS applications, this problem will continue to grow, but there are other places where the effects can also be seen. Latency causes data transfers to slow down. For some organisations this can lead to a host of cascading problems, especially when large data files or things like medical records are being transferred from one location to another. Latency can become a major headache for large data transactions such as database replication as well, causing regular processes to take much longer than anticipated.
To some extent latency can be mitigated, but it’s the unpredictability that’s difficult to manage. You may have heard a lot over the past year about the advantages of machine learning and AI in network management, but it’s even difficult for these cutting-edge tools to minimise latency. The problem stems from the fact that we cannot accurately predict when a switch or router will become overloaded. It might be just a millisecond or a hundred milliseconds of delay, but once the equipment is overloaded, all of the data gets stuck in a queue until it can be processed.
Most organisation today are constantly modernising through the adoption of a wide variety of SaaS applications and cloud solutions. Despite all of the benefits that these solutions provide, latency will continue to be a challenge unless corporate IT teams rethink their approach to network management. It all boils down to two things: more effective network monitoring, and a renewed focus on understanding end-user experience.
First, with so many applications now completely bypassing the data centre, it makes sense to monitor network traffic from multiple locations as standard practice. For that to happen, network performance vendors must develop more cost-effective tools that complement the traditional NPM tools installed in their data centres.
Second, IT professionals have to change the way they think about their network dashboards. Rather than just monitoring the global view of network health, they need the ability to see and react to whatever users are experiencing, in real time. When done correctly, this type of proactive monitoring and troubleshooting can help IT professionals resolve network or application problems caused by issues like latency before end users even realise that there was a problem.
Jay Botelho, senior director of products, Savvius
Image Credit: Flex