A day in the life of an IT pro: The contradiction of monitoring [part 1]

Recently an IT friend of mine told me that for the past five years he has been longing for a tool which could alert him when non-routable interfaces went down.

To be perfectly honest my heart went out to the guy – that’s such a basic monitoring request, why hasn’t it been fulfilled I thought. But when I thought about other monitoring requests I’ve heard over the years it made me realise there is a major contradiction when it comes to IT pros and monitoring.

On the surface no IT professional will accept sub-optimal performance or functionality for any technology. IT pros overlock their systems, exploit back doors, root phones to get the latest (or un-supported) version, memorise complex key combinations to access god-mode – we won’t settle for anything but the best and do our upmost to support it. However when it comes to monitoring this is not the case. I’m constantly meeting network engineers and sysadmins who install monitoring functions which completely fail to do the job required. What is more upsetting is how so many IT pro’s accept this as de rigeur - there is no sense of urgency to demand more sufficient monitoring tools.

I’ve therefore decided to tackle this problem myself in this two part series to help identify why IT pros are putting themselves through this.

Home-grown vs corporate monitoring solutions

One of the first issues behind monitoring stems from how the tool is implemented in an organisation. The two most popular ways are via a home-grown solution or a corporate project. Both of which have benefits, but which also won’t always offer the best results.

Home-grown monitoring solutions are often created on the side and on the cheap with a ‘that will do for now’ mind-set. On the plus side, the price is right and the solution will do exactly what the user needs (give or take, depending on their skill). On the down side, these types of solutions won’t grow or improve without a crisis. The person that created it will rarely touch it again unless it’s broken because, well frankly, they have more important things to do than spend their time adapting a so-so monitoring solution. This ethos immediately gets the organisation stuck with an average monitoring tool that can more-or-less do the job at hand, but to no exceptional standard.

On the flip side there are corporate-sponsored projects to implement an enterprise-wide monitoring solution. These are usually sponsored by executives who, following a fancy presentation from a vendor complete with numbers and buzzwords, believe they have The Solution and any naysaying on the part of staff is just resistance to change. And what I’ve learned over the years is often when a sales guy comes and says ‘ROI’ ‘efficient’ and ‘speed’ multiple times to an executive, it’s hard for them not to fall in love with whatever product they are selling. The knock on effect of this is that the tool then becomes associated as that executive’s metric for success. And so, since these solutions are typically not cheap, there is a huge push to not only get it implemented (come hell or high water) but also get it working everywhere with everything. Because gosh darn it we need to get our money’s worth out of this thing! Subsequently the tool, whatever its strengths may be, gets shoehorned into various situations it was never meant for.

In this instance you’re left with a great tool that isn’t being used in the way it was meant for and is being owned by teams who have no vested interest in it. Thus much of the potential this monitoring tool could offer to an organisation goes to waste.

Do monitoring professionals even exist?

Another challenge in the monitoring “space” is that it’s difficult to get specialists from all of the relevant teams; network, systems, virtualisation, storage etc. to agree on a set of metrics for monitoring. As difficult as that is, it’s even trickier to find a monitoring professional who is familiar with all the different areas and has the knowledge to confidently mediate between the different teams to find a solution that works for all. To be quite frank it can be difficult to even find a monitoring professional at all.

Most organisations make do with a consultant who is an expert in a particular toolset. With larger solutions even this becomes a challenging proposition as experts sub-specialise in various aspects of the tool. If the expert is keen on code, every problem looks like a scripting solution. Protocol centric pros will leverage SNMP, WMI, or the like. And so on.

The net effect is that monitoring options offered to technicians may be feature sparse (if the software is home grown); or undertaking the wrong tasks, or undertake them the wrong way (if it’s a corporate sponsored solution).

Thus implementing a monitoring solution that is robust enough to deal with the challenges and demands of multiple teams within a business is, and this is a gross understatement, hard.

The Path to the Promised Land

Implementing a monitoring system that covers all the bases and provides teams and organisations with the information and responses needed to be valuable requires several things: it requires multiple tools, often from multiple vendors (though few vendors will admit this). It requires technical leadership who can get the right teams into the room, start the right conversations, and help them to answer the right questions. And it requires organisations to commit to the level of expertise both during the implementation phase and the subsequent usage to make any set of solutions truly effective.

Clearly it’s harder to do this than people realised as, based on discussions with my colleagues, customers, and coworkers, it looks like many IT pros have given up on it altogether. In the face of this kind of challenge, many just accept the status quo of monitoring because they feel too many variables are out of their hands.

However, my view is this is categorically unacceptable for any organisation, any budget, any industry and any size. When monitoring is done right it can be a powerful force within an organisation, enabling not just the remediation of downtime, but the avoidance of it; it drives performance improvement rather than just tracking degradation; it helps organisations avoid un-necessary spending rather than just being a sunk cost; as well as being a morale boost to engineers who know they can rely on something besides their own eyes and gut to tell them that everything is sufficient, and to quickly and accurately lead them to the root cause of a problem when it's not.

This Promised Land I am describing does exist “out there” and can exist in your company, and I’ll tell you how to get there in my next post.

Kent Row, IT admin and superhero, SolarWinds

Image source: Shutterstock/ra2studio