Disaster recovery (DR) is, by its very nature, difficult to plan for. But we’re all well aware of the problems associated with insufficient DR processes, policies and procedures.
If a business’ IT infrastructure cannot recover from a ‘disaster’ quickly the implications can be extremely costly. Simon Johnson, data recovery practice lead at GlassHouse Technologies UK, discusses two key recovery metrics that IT managers should consider when developing a DR strategy.
Recovery Time Objective (RTO) and Recovery Point Objective (RPO) are key measurements an IT manager needs to make the business aware of and provision for downtime accordingly.
Both are recovery metrics that are calculated in time which provide quantifiable figures used to understand the tolerance levels of the business for application downtime and data loss.
RTO measures the maximum amount of time that is needed to recover from disruption and for the business to be operational again. The more aggressive your RTO, the shorter the critical time period to restore the system to normal functioning.
This, inevitably means more financial investment is required in high availability infrastructure, but perhaps a small price to pay in the long run if something does go wrong.
There are many technology options to consider including various clusters or complete redundant infrastructure and data replication on or offsite.
The RPO looks at the maximum amount of data loss acceptable in the event of a disruption. A business will ask itself “how much can we afford to lose”.
For example, if there is a nightly backup at 21:00 and the system fails at 07:30 the following day, the system will have lost all data modifications since the backup at 21:00 the previous night. The question is – is that loss acceptable to the business?
Like RTO, the more aggressive the RPO, the greater the financial investment in infrastructure is required to meet the objective in a shorter period of time.
Some businesses - or areas within a business - may not be able to tolerate RTOs and RPOs of any longer than a few hours, while others may be able survive downtime for periods of, say a week with minimal impact.
These requirements can normally be determined by the Service Level Agreements (SLAs).
For years businesses and their IT departments have struggled to understand and communicate effectively with each other, resulting in either significant under or over investment in both operational and disaster recovery application protection.
Accurate RPO and RTO metrics have helped bridge this gap and, combined with business impact analysis, facilitate the alignment of applications to correct data protection levels and generate the accurate levels of investment to protect data.
SLAs are unachievable unless a business has the capabilities to deliver them. Organisations need to understand how and where data protection is delivered in order to optimise operations and meet the SLAs.
Although they typically play significant roles, backups, snap shots and mirrors do not solely deliver RTOs and RPOs. Many levels of resilience throughout the IT supply chain combine to deliver recovery capabilities. These must all be accurately measured to generate the RPOs and RTOs.
These quantifiable objectives translate requirements into tangible metrics which facilitate the selection of infrastructure to enable effective achievement of the SLAs, even in an unforeseen disaster situation.