A Day in the Life of an IT Pro: Emailgeddon

In this ongoing series Kent Row (seasoned IT admin evangelist) reveals the trials and tribulations of an IT professional, he wrote his own bio:

Kent Row is a seasoned evangelist for IT admins at SolarWinds. He is at the forefront of a technological age of hosting, firewalling, trolling, tweeting, blocking and CTRL-ALT-DELETING.

He is fluent in both technical and non-technical liaison; an organisational guru who has averted more crises than you’ve stacked applications. Don’t let his excellent communication skills fool you though – he has his very own privacy policy and will maintain customer confidentiality at all times.

When you’re starting out as an IT Pro, it can be a daunting task. Keeping on top of the ticket system and being on the end of the phone/ email/ messenger service for people who decide to disregard the ticketing system (we know this too well) can be extremely tricky, especially when there is a perception from the end user that “everything is urgent”…and it may very well be if it impacts productivity. These issues are heightened when more than one problem crops up at the same time: being prepared to solve one particular problem, but unexpectedly finding an entirely different one lurking beneath the depths of the IT infrastructure.

So, to best arm budding IT Pros against some of the main horrors of IT support, I thought I’d impart some wisdom or at least comfort in knowing you’re not the only one when it all goes wrong. Hopefully you can pick up some tips and swerve the same eventuality, instead of heading for the hills.

"Random Crisis Day" also known as "Every Other Day"

We begin our IT Pro journey to discovery on a rainy Monday in the Network Operations Centre, or NOC (pronounced Knock), which, as you probably know refers to one (or more) locations from which networks are monitored, controlled and managed. I had been set the task of answering some questions on traffic analytics implementation, before it became all too obvious that it was “Random Crisis Day.”

This was a new office with just over one hundred employees, new gear and a decent Wide-Area Network (WAN) link, but employees out in the branches weren’t able to send or receive emails. There was much running around and several more tea breaks than usual whilst helpless end-users contemplated the good old days of typewriters, the pigeon post, and, dare I say it, paper and ink. It was my duty to get these emails back on track. Countless employees were waiting to write their “hope you had a nice weekend” openers and by hook or by crook, I would grant that wish, and in turn, make my manager proud.

I assume you've tried turning it off and on?

The server admin bounced Exchange to no avail and was next checking various Exchange settings. And despite all the new hardware, they hadn’t fully implemented application monitoring, so we spent an hour manually combing settings and performance reports. The only thing that popped out was long send queue lengths and a low delivery rate. Memory, CPU and Windows performance counters all seemed OK.

Running out of ideas, I fell back to a continuous ping from a client PC and there it was: ridiculously slow response times, even for a WAN. About that time, the network guy asked if we were fixing “that pegged Network Interface Controller (NIC).” He apparently got an alert on his phone from his network monitor the day before. It turned out to be several workstations running BitTorrent and hogging the WAN link port.

Kent Row saves the day...again

And there we have it. While all Network Monitoring Solutions might be able to provide monitoring of business processes and the status of inter-related systems, you can’t ignore the foundation, otherwise the simplest of problems can become the hardest to resolve. Network monitoring must go beyond “ping”. It must also include bandwidth information, dropped and errored packets, the state of WAN interfaces, as well as information on the status of network hardware like CPU and RAM.

Some NMS solutions will offer real-time analysis of the packets on the network, calculating the time it takes for a user to get information back from an internal system like ERP. With these kind of techniques, you can easily answer the question “is the problem (slow response) the application, or the network?” and you can resolve the issue that much faster.

Have you ever saved the day with your IT-fu? Or had emailgeddon unleashed on you? Let us know in the comments below