We all know those days – the server crashes, the last successful backup is everything except up to date, and then the manager appears at the door asking when work can resume. Just when you think things couldn’t get any worse, customers begin asking for answers, and it becomes apparent that the IT crisis has well and truly hit.
With limited time to repair services before customer relations are damaged, it’s down to the IT department to save the day (and sometimes the business’ reputation).
In a world of near-constant connectivity, losing access to everyday operations is one thing, but for critical IT systems where the reliance of interconnected devices is imperative – such as in healthcare – it’s vital to prevent any potential downtime. As the Internet of Things (IoT) and our reliance on data increases, the importance of stable systems becomes paramount. Now is the time for IT professionals to arm themselves against IT failures.
Prevention is always better than cure
When looking to implement a best practice prevention strategy, it’s important to remember that monitoring a network successfully often means being able to anticipate how the nework is behaving at any given time.
To ensure that you are prepared for any crisis, it’s essential that you know what data needs to be accessed and how quickly that data can be found. This is becoming increasingly difficult thanks to the the high number of devices now connected to a network.
As a network administrator you need clear insight into the use of relevant infrastructure before and during a crisis. You need to know where to shut off excessive loadbearing datastreams and you need to be able to get a visual view of this first. Effective network monitoring allows for early stage prevention because you can see where a problem may be starting, before it begins to impact the rest of the system.
Ensuring that there is a “Plan B” in place to help anticipate any potential risks, is essential. I recommend thinking deeply about all the possible scenarios at play and ensuring that they’re covered.
But what happens when we don’t have access to our carefully laid out plan B? When the middle of the night has struck and an IT issue goes unnoticed until the early morning? How do we resolve the situation when it really becomes unmanageable?
Here are our five steps to ensuring that a network availability drama doesn’t turn into a crisis:
1) Alarm Chain & Emergency Contact Persons
As a first step, sit down and define exactly who needs to be informed in a crisis, when and in which sequence. Write down all the involved contact people, who would need to be alerted and who will be impacted. List them in the sequential order that you would need to contact them. Make sure the contact details are complete and keep them updated regularly. I can’t stress this enough, there is nothing worse than pulling out a well-prepared contact list to find that the information is out of date! For more critical roles, one contact option may not be sufficient so think about substitutes that you can call at all hours of the day. List at least their phone number, mobile number and e-mail address.
2) Emergency Scenarios & Recovery Plans
As an exercise, write down the worst-case scenarios within your company’s IT environment along with a detailed resolution scenario for each. Brainstorm this with your team and listen to experiences that they may have had in the past. Build out the possible solutions for each scenario. I would suggest you record these exercises because if it ever comes to the worst, you can always take out your notes to find (or build) the best approach to a given crisis with a combination of solutions.
3) Compile, Print and File Your Crisis Process in an External Location
Once you’ve compiled your process – from the alarm chain, the contact people and the possible emergency scenarios – think of the best place to store it. It may not be possible to access your server for a period of time so it’s essential to also have a hardcopy close at hand. Make sure to print out all the required documents. Store a copy in a safe place at your office and file another copy in a separate fire-proof section (or even at home if your employer allows it).
4) Get the Management on Board
The exercise of recording your crisis plans may shed light on shortfalls within your existing operations. Management needs to know about these shortfalls early on and take corrective action where necessary. This may carry financial requirements. Also, once the crisis has passed there may be additional concerns - just imagine your server room is gutted and you need to replace the whole environment ASAP – what do you need? Either you need really understanding suppliers or you need ready cash! Make sure your CEO and especially your CFO is informed as quickly as possible.
5) Testing & Continuous Improvement
The nature of a crisis is that it usually sheds light on those areas we haven’t planned for – unfortunately a little too late. The best emergency plan is worthless without proper testing and making the effort to take this step seriously can bring about new solutions. You need to put all your theoretical recordings to the proof periodically – usually once a year. A smooth running emergency crisis simulation means you’ve already conquered half the battle if it comes to the worst!
IT failures can be unpleasant, to say the least – but it’s not the end of the world. With a few simple steps you can regain control of your network by relying on the basics – the bare necessities that will get you through the first stages of any IT emergency. With a little luck and careful planning you’ll be able to get back on top of your daily routine in no time.
Martin Hodgson, Head UK & Ireland, Paessler (opens in new tab)
Image source: Shutterstock/hafakot