Many, if not all, IT professionals find alerts annoying. They are irritating, often useless and sometimes false. Therefore, if you’re an IT professional who specialises in IT monitoring, and you spent a good amount of time formulating an alert, it’s never nice to find out it’s falling on deaf ears.
In that heart-breaking moment of realisation that the IT team barely bats an eye at your carefully constructed alerts, you’ve probably considered capturing the attention of the IT team through an alternative alert message using of the following methods:
- Sarcasm: Hey, server team, don’t worry, I’m sure this alert isn’t for anything important… You just carry on ignoring it.
- Exaggeration: DANGER, SERVER TEAM! Router will DISINTEGRATE in 5 seconds!
- Sympathy (or just pathetic): Hey, I’m the IIS server and it just got really dark and cold in here. Can someone come turn the lights back on? I’m afraid of the dark.
Or perhaps you’ve gone down the clickbait route…
- This server’s response time dropped below 75 per cent. You won't believe what happened next…
- We showed these sysadmins the cluster failure at 2:15 a.m. Their reactions were priceless.
- Three naughty long-running queries you never hear about.
- Watch what happens when this network outage causes a mass data loss. The results will shock you!
Although gimmicks, bribery and guilt-trips are tempting, they miss the point. IT monitoring specialists need to craft meaningful alerts which, when heard, are easily actionable. Let me take you through a few common alerting problems, why they happen, and how to better tailor alerts to meet the needs of the IT pro.
Problem: A key device goes down - for example, the edge router at a remote site - and the team gets clobbered with alerts for every device at the site.
‘Down’ doesn’t necessarily always mean that the device is actually ‘down’. In many cases it’s letting you know that some visibility has been lost due to a device upstream being down – causing monitoring to draw to a halt until it is back up.
Most monitoring solutions will have an option to quieten alerts which are going off as a result of ‘upstream’ or ‘parent-child’ connections. Ensure that this is turned on, but first that the monitoring system understands the device dependencies in your environment – this will help ensure you are only getting the alerts you need.
Problem: Certain devices trigger at certain times because the work they’re doing causes them to “run hot.”
When an alarm goes off for high usage of a system, this can be completely normal, but also completely above the regular run rate, for example, month-end report processing. The problem here is that some ‘high usage’ alerts are actually perfectly fine, however if you set the threshold to be higher, you may then miss important issues from systems that cannot cope with the high usage.
In this instance, you need to use monitoring data to your advantage. Monitoring is not an alert or blinky dot on the screen. Instead it is a regular, ongoing collection of a set of metrics from a set of devices. Alerts, emails and blinky dots are just a by-product you enjoy when monitoring is done correctly.
Therefore, in order to get around this problem, you need to take into account what ‘normal’ looks like for each device and build your alerts around this. This ‘baseline’ reflects an overall average, normal run rate per day and even per hour. Once this baseline is formed, you can easily set up alerts which “alert when CPU % utilisation is >= 10% over the baseline for this time period”, ensuring you only get the alerts which matter.
Watch what happened when IT pros tried these weird monitoring tricks… you won’t believe what happened next!
IT monitoring specialists need to utilise the capabilities of their monitoring solutions to the fullest. By alerts becoming less frequent and more specific, the ‘boy who cries wolf’ mentality is removed and instead alerts are taken seriously and the IT professional reacts to them in a timely fashion. In turn, everyone begins to experience the true value of good monitoring, which allows the IT monitoring specialist to create more alerts which build insight and constantly improve the environment.
“IT monitoring specialist saves company £££” isn’t clickbait – with some rational thought, planning and implementation when it comes to monitoring, it can be the truth for every business.
Leon Adato, Head Geek, SolarWinds
Image source: Shutterstock/ra2studio