Hunting ghosts and goblins plaguing enterprise networks

Today’s enterprise networks are a haunted house of application errors, latency, jitter and packet loss. Traditional network performance monitoring tools and dashboards are not equipped to analyze all of the nooks and crannies where spooks may be lurking. Instead, they usually feed a goopy stream of metadata or NetFlow information into a database, leaving these tools way behind the action with slow and inaccurate results that make it much harder to exorcise the problems. 

So, why are conventional tools inadequate in the fight against network gremlins, and what do NetOps teams really need to effectively hunt down and eliminate these ghosts and goblins from their systems? 

The Three Ugly Gremlins: Jitter, Latency and Packet Loss 

Jitter, latency, and packet loss are common on any network, but they love to wreak havoc with VoIP and other Unified Communications (UC) applications on converged networks. This is a good example of where effective Network Performance Monitoring (NPM) tools should be a crystal ball, providing visibility and analytics to help find and expel problems.  

One of these three gremlins, latency, is becoming a particularly devilish problem for many IT professionals to solve, for one very good reason. Enterprises today rely so heavily on cloud-based tools and data storage, that when ghoulish latency problems strike, even the Ghostbusters would have a hard time capturing them. 

Bats in the Belfry 

We know that enterprises aren’t going back to the old days of centrally located data and applications any time soon, but why can’t traditional NPM tools keep up? A lot of the blame for this scary state of affairs can be put at the feet of NetFlow-based solutions. While they were great in their day, they haven’t kept up with the times, so today they can’t provide the kind of detailed information about distributed and cloud-ified networks that IT professionals truly need. It’s time for them to shuffle off to the graveyard.  

Here’s the problem in a nutshell. Despite their popularity, traditional flow-based tools have to rely on data that is unidirectional and often incomplete. This kind of analysis doesn’t provide the level of detail needed to troubleshoot a network’s zombie anomalies, particularly if these anomalies are intermittent and erratic in nature. Sampling network data, using sFlow, isn’t going to save you, either. Although it theoretically enables NPM tools to monitor faster networks, its algorithms may skew the data or allow sneaky goblins to go unnoticed. In other cases, the information needed to troubleshoot a haunting can only be found in the payload, while NetFlow relies on the packet header. Another issue is speed. By the time you’ve waited for flow records to be compiled and exported to a flow collector, stored and analyzed, it may be too late… 

Cobwebs on the Tools  

NetFlow-based NPM tools are everywhere. They’ve proven their value and do a lot of things well. But here’s why IT professionals also need to up their game by including tools that provide an even more accurate and up-to-the-minute view of what’s really happening on their networks: 

1. No measurement of Latency, Quality, Utilization and Saturation:  Although they provide a lot of valuable network information, NetFlow doesn’t measure latency, quality, network utilization or saturation. Without those, not even a crystal ball can help you understand exactly what users are experiencing.    

2. Time granularity:   NetFlow generally has 5-minute granularity. Because it is a snapshot of events on the network at the time of a unidirectional flow, the ghostly delay makes it difficult to isolate intermittent problems.  

3. Application granularity:  NetFlow sources are usually port based. Therefore, popular browser-based applications such as Facebook and Twitter will be lumped together as port 80/HTTP problems, while critical business applications may materialize as random port numbers. Your application will just be lost in the zombie horde.    

4. Physical network bias:  As enterprises rely more on hosted data and applications, effective network monitoring across those environments needs visibility into virtual, cloud and SDN networks. Traditional tools using NetFlow may leave you searching blindly in the dark.    

5. Lacking scalability:  NetFlow can monitor traffic on a specific router or switch, but scaling that up on a large scale is costly and can become scarily complicated.  

6. Inconsistent data sources:  Considering that NetFlow is proprietary, you’d assume that all NetFlow sources are created equal. Wrong. Old routers use different formats and provide different granularity to new ones. This inconsistency in versions, granularity and depth is enough to make you scream.  

7. Burden on network & equipment:  One of NetFlow’s spookiest drawbacks is that its own data can flood the network, potentially making a problem worse. Compressing the data can help, but it also adds more complexity to network visibility. sFlow creates less data but that again results in reduced visibility and greater granularity. 

The Devil in the Details 

In a haunted house, not knowing what’s around the next corner can be the worst feeling of all. It’s the same on an enterprise network; without the right tools, we can’t divine a network’s problem nooks and crannies. Having the right tools for the job is like having full disclosure. We know exactly what we’re facing—no matter how scary it may be—so that we are better prepared for battle. Here’s what we really need in our NPM tools: 

1. Real-time If you get the shivers just thinking about delayed dashboards, you might need a packet-based NPM solution. Only packet data delivers eerily quick visibility, and enables frighteningly actionable insight into the quality of a network’s data and performance. 

2. Look both ways!   Rather than a one-way conversation, we need a packet-based solution that sheds light on the two-way conversations between client and server, two IP addresses, etc. This view of the entire transaction makes it possible to sniff out and neutralize issues lurking in the shadows, such as latency. 

3. All network layers, L2 – L7 A packet-based NPM solution is like a magic mirror for IT teams. It looks at all of the network layers, including the transport layer, session layer, presentation layer and the application layer. 

4. 100% accuracy Don’t rely on tarot cards. A packet-based solution delivers a 100 percent accurate view of network health and performance. Need help at the scene of a network crime? Forensic investigations are a snap thanks to its inherently trustworthy data. 

5. Minimal network impact If the life of your network is being sucked dry, a packet-based NPM solution—with minimal impact on network speed—may be the answer. 

6. Monitor AND troubleshoot Accurate, real-time information about the network from a packet-based solution lets IT pros stay on the lookout for nasty apparitions while maintaining network health. And if any gremlins happen to strike, they won’t stand a chance. 

Happy Halloween, everyone! 

Jay Botelho, Senior Director of Products at Savvius 

Image Credit: Sergey Nivens / Shutterstock