Conducting a disaster recovery exercise

Whilst spending time on your Disaster Recovery Plan (DRP) might not feel like a number one priority, having an IT recovery strategy has become crucial for most companies.

The networks and systems used have not only become more complicated and powerful, but are often solely relied upon to provide services to customers and store sensitive data, increasing the severity of problems that do occur. Having the right procedures in place in the event of an emergency will mean your company will be able to resume normal activity within a documented timeline. 

Although it can be expensive and time consuming, making time to test your DRP can validate your recovery plan and help you to identify any procedures that need changing so that problems do not occur or the severity reduced. Remember, disaster recovery is not just reliant on IT; there are people and processes involved that need to be fully prepared so that if disaster strikes, your business can survive.

How often should you test your disaster recovery plan? 

All businesses are different so there is no right answer for how often you should test your DRP. The key thing to remember is that your plan needs to be up to date and reflect your current businesses systems, so a good rule-of-thumb is to schedule an exercise when these systems change. Exercises may be monthly or annually; your company’s recovery strategy will influence the frequency.

Your DR strategy should be documented and changed or updated as often as your business so that following an incident, resumption of all relevant and current systems occurs in a pre-defined sequence. Major alterations in personnel, operational systems, or devices should always flag a DRP review, updates to the plan and scheduling of an exercise that includes testing the modifications that occurred.

By incorporating these updates into your change management process, you can ensure that the testing schedule reflects the way your business operates, and ensures provisions are made for appropriate testing at least once a year.

Preparing for an exercise 

Before you test your DRP, it is important to properly prepare so that the outcome is realistic and the results more insightful. By ensuring everyone with DR responsibilities is involved in the testing process and that at least two people are capable of executing all procedures, your business will be better prepared for a disaster, even if team members are absent due to sickness or vacation time or that one no longer works at the company. 

When possible, the DRP creators should take a backseat during the testing process; it is key that the recovery procedures can be performed without them. Although they have a comprehensive knowledge of the DRP, any extra insight or shortcuts the creators may apply during execution of the procedures will not be available during a disaster. It is important to capture all “tribal knowledge” before an exercise begins.

The goal of a DRP exercise is to determine whether a recovery objective (set by your business) is achievable by following the procedures and strategies in place. To make the exercise effective you should be fully prepared to document issues encountered and evaluate the exercise to address areas for change or improvement identified. It is not enough just to know that something doesn’t work, you need to know what failed and why. Someone who fully understands the DRP should be assigned to observe the exercise, record any issues and take comprehensive notes for areas of improvement.

Recording the details of an exercise is important; observations on how smoothly the test runs, as well as, major weaknesses and areas for improvement. It is also a best practice to record how long the procedures take and identify any major repercussions or costs that may result when systems and personnel are unavailable.

Ways to test 

It is not always practical to run a full scale DR exercise as this can be expensive and time consuming; there are other ways to validate that your DRP is up to date. To stay on top of change management impacts to disaster recovery processes, try to fit the checks identified below into your business schedule at times that work for you.

These methods of disaster recovery testing all have benefits and can help to ensure your business is fully prepared for disaster recovery.

1. Plan Review

A plan review is the most basic DRP test and simply involves the continuity management and disaster recovery planners making time to meet and go through the existing process documentation to identify areas needing updates or changes. This can be carried out regularly without too much of a drain on resources and should be integrated into your business schedule a few times a year.

Checking that everyone involved is aware of their roles and duties during a disaster and that it is practically possible for suggested recovery procedures to be carried out is also key to a plan review. Changes identified are prepared during the review session as much as possible. To address changes not resolved during the review team members are given a target completion date. Plan owners track open items to completion. Major issues or areas of changes are candidates for inclusion in the next exercise.

2. Tabletop Exercise

A tabletop exercise is a good way of testing whether everyone involved is fully aware of the DRP and the procedures they must follow in the event of a disaster. This test should be treated as a serious rehearsal and involve the full team getting together to do a ‘walk through’ of a disaster scenario.

A facilitator leads the exercise and describes what is happening as the scenario unfolds. Team members are asked to go through the plan in detail and describe the actions they would take under certain conditions as the scenario progresses. By analysing the team’s responses, glitches in understanding and protocol can be identified and addressed.

3. Full Scale Exercise

A full scale exercise is where your DRP and processes are validated (or not!). It should not be treated as a rehearsal, but be as close as possible to a real life scenario so It is likely that you will need to spend some time and money on a full scale exercise and account for downtime in systems and personnel.

The scenario and objective set should be laid out as fact and team members should respond to it in this way. It may even be worth considering whether this sort of drill could be kept secret from various members of the team to make the exercise appear more realistic. Expect to use company resources like recovery sites and backup systems, and in some cases allow team members to leave their primary site to implement backup systems and restart the technology at the recovery location.

The benefit of conducting a full scale exercise is that it is a true indication of how well your business could recover from a disaster and the potential problems to look out for.

What if something goes wrong? 

If something goes majorly wrong during a DR exercise, it can be extremely concerning, but remember that the purpose of running an exercise is to identify and resolve problems with documented procedures so that they do not occur during a real disaster. You can be sure that any faults made apparent under test conditions (that are relatively well planned and resourced), will be much more apparent in the event of a real disaster, so it is vital that any glitches are ironed out as soon as possible.

Record all setbacks and faults in detail. Categorise and investigate these issues after the exercise; use the information collected to fix any issues. Run the exercise again with the changes and updates in place to make sure the issues are fully resolved and have not opened up any new problems.

This may need to be done more than once to ensure complete resolution of an issue and the process should be recorded for future reference.

Sungard Availability Services

Photo Credit: Olivier Le Moal/Shutterstock