For decades, Disaster Recovery (DR) has dominated the landscape as the best architecture for business continuity. The problem is, DR expects you to have a disaster, and then to recover from that disaster. Because that capacity sits idle until disaster strikes, many customers call that “Dark DR.”
In today’s digital business world, of course, disasters aren’t tolerated well. A majority of organisations cite considerable loss to revenue and/or reputation if their online offerings go down. Rather than build DR structures, organisations today need to design for Continuous Availability. Continuous availability, in turn, requires active/active architectures.
Along with meeting the new demands of “always on” business, continuous availability models also avoid the inefficient economics of DR. Historically, active/active operations created their own technical and economic obstacles, including major challenges in having applications talk to database resources across multiple locations. Fortunately, new approaches mitigate many of those challenges.
Active/active operations provide a spectrum of advantages to enterprises, including:
- Lower total operational costs
- Improved asset utilisation
- Seamless scalability
- Dramatically higher uptime
- Improved end-user experience
- Superior workload performance
A close look at the technical and economic factors underlying dark DR and active/active strategies reveals the pros and cons of each approach. Ultimately, though, with demand for continuous availability dominating nearly every industry, active/active operations clearly provide the best mix of advantages.
Limitations of Dark DR
Best practices dictate keeping data and redundant systems safely replicated at a location far away from the primary data centre in case of some regional disaster. This model requires a regular flow of information between the two (or more) sites. Replicating a copy of the data for backup reasons is fairly straightforward. Using those secondary systems live along with the primary systems, however, has historically been very challenging at the application layer. The dominant model has been to employ a robust primary server able to handle the full workload (with plenty of overhead to spare) backed by a remote secondary server.
The two servers run replication software, which sends a copy of the primary’s data to the secondary server to allow for information recovery, but that secondary server is passive, or "dark" – i.e., the application doesn’t talk to it during normal operations. The “dark DR” model suffers from several technical disadvantages. First, operations suffer when the primary server fails because the secondary server frequently lacks all the information, applications, and customised code the primary server holds. IT staff often overlook DR systems, which then fall out of synch with the primary systems. Those missing pieces must be identified, organised, and migrated to the secondary server to restore operations.
In addition, the primary server’s normal workflow must be redirected to the secondary server, which becomes, at least temporarily, the new primary server. This redirection can require significant amounts of manual configuration, with two IT teams (one at each location) working overtime to enable and troubleshoot the switch. Similar reconfiguration applies to DNS, networking, replication topology, and other infrastructure elements. Testing requirements are massive, and additional IT staff must step into place at the secondary facility while the original IT team remains pinned down trying to get the primary facility back online.
Dark costs of DR
For those who follow the DR strategy, running a full-capacity redundant system in a secondary site represents a necessary yet considerable ongoing expense to the enterprise. Too often, executives see this investment as inescapable and so turn a blind eye to the factors comprising that expense.
These costs include:
- Extra maintenance. While the primary server runs continuously, the DR server passively waits, but must be ready to serve traffic, meaning that enterprises maintain twice the infrastructure for the same amount of operational activity.
- Extra staffing. Any required active system maintenance forces the primary server to shut down. Businesses must keep IT staff on hand sufficient to deploy in both locations – one to complete maintenance on the primary server and the other to make the passive DR server active.
- Downtime costs. Disaster recovery procedures can take hours to recover information and restore operations, which incurs lost revenue and unmet service level agreements. Given an average (and increasing) rate of $7,900 per minute (Ponemon Institute), downtime creates a potentially huge cost for enterprises, both in immediate business and long-term reputation.
- Lost business. Slower application performance due to over-capacity servers results in lost revenue as shoppers/buyers are more likely to abandon shopping carts due to lag-time.
The costs associated with running a full-capacity redundant system in a secondary site can be numerous and subtle. Those costs can be especially hard to swallow when expected returns on infrastructure investments prove elusive.
Advantages of active/active
Given the limitations of DR, businesses need an alternative, particularly as webscale IT practices filter down from the likes of Google and Facebook into mainstream enterprises. These organisations have not only introduced the world to new IT practices, but more importantly, they have reset expectations among users. These “personal” apps perform so well and so consistently that enterprise users now apply the same standard to all applications they use, including enterprise applications.
The active/active model offers several notable technical advantages:
- It enables a smooth failover, meaning operations transition from the failing server to the other server(s) with no interruption in services.
- A team can perform maintenance on one system while the other stays active.
- Businesses can cut expenses by moving workloads in response to changing cost factors, such as local energy or real estate costs that impact a data center’s financial viability.
- Applications can handle more traffic due to the scaling of capacity.
- Cutting workload levels in locations creates more capacity for serving traffic growth.
- Security improves because IT can patch a vulnerability on demand rather than waiting for the next maintenance window.
These technical advantages also pay dividends economically:
- By spreading the traffic load across multiple systems, organisations put less strain on servers, extending the functional life of hardware.
- Lower site use means lower hardware expenses. While a dark DR system means a total extra cost of 2.5 to 3 times that of a single centre, an active/active setup increases costs by only 1.4 to 1.8 times. That is because organisations don't need as much hardware in each location.
- For many organisations, increased application performance leads directly to enhanced revenue, such as from speeding up transactions on e-commerce sites so that customers are less inclined to abandon shopping carts.
- Maintenance costs are lower because the tasks can be done during work hours rather than requiring a crew in the middle of the night. They also require fewer staff members because organisations can keep the application running during maintenance, so developers and other application specialists don’t need to be involved.
- With little to no downtime required for maintenance, organisations can further increase revenue that otherwise would have been lost during those offline
Any enterprise can overcome the technical challenges of implementing an active/active architecture and reap the benefits of the Continuous Availability model it enables.
Michelle McLean, VP of Marketing, ScaleArc
Image Credit: Dotshock / Shutterstock