Skip to main content

Making high availability more cost-effective in a hybrid cloud environment

(Image credit: Image Credit: Melpomene / Shutterstock)

For some applications, the private cloud remains the best choice for a variety of reasons. For others, the public cloud has become the more capable and cost-effective choice. The result is a hybrid cloud architecture that facilitates some new and potentially beneficial capabilities. One such capability involves leveraging the additional agility and scalability afforded in a hybrid cloud to implement the different high availability and/or disaster recovery protections needed for different applications, regardless of where each runs.

This article examines the hybrid cloud from the perspective of high availability (HA) and disaster recovery (DR), and offers some suggestions for making configurations more cost-effective. The hybrid cloud, when used prudently, can afford considerable savings over maintaining multiple enterprise datacentres to implement robust HA and/or DR protections. Of course, those less critical applications that do not require such protections can also be good candidates for migrating in whole or in part to the public cloud.

Cloudy conditions

Cloud service providers (CSPs) have implemented carrier-class infrastructures to give the public cloud a resiliency that far exceeds anything that could be justified for a single enterprise. Redundancies exist within every datacentre, and there are multiple datacentres in every region and multiple regions around the globe, all of which give the cloud its unprecedented scalability and reliability. Failures can and do occur, however, and some of them cause downtime for customers who have not made special provisions to assure high availability for their applications.

In their service level agreements (SLAs), all CSPs define “downtime” somewhat differently, and all also exclude certain causes of downtime at the application level. In effect, the SLAs guarantee only the equivalent of “dial tone” for the virtual machine (VM) or physical server, or more specifically, that at least one instance has connectivity to the external network when two or more instances are deployed across multiple availability zones.

Here are just three examples of common causes of downtime excluded from SLAs:

  • faulty actions, or a lack of action when required (which covers the mistakes inevitably made by mere mortals)
  • the customer’s software or third-party software, including application software (such as SQL Server or SAP)
  • factors beyond the CSP’s reasonable control (including carrier network outages)

Another limitation is the lack of a Storage Area Network (SAN) or other form of shared storage in the cloud. Of all the options designed to address this limitation, the purpose-built SANless failover cluster is capable of meeting the most demanding recovery time and recovery point objectives for all mission-critical applications for both Windows Server and Linux. SANless failover clustering software works in private, public and hybrid clouds, and its ability to detect failures at the application and database levels eliminates the gap created by the downtime exclusions in the CSPs’ SLAs.

A hybrid HA/DR cloud

One common configuration for a hybrid cloud is to have the public cloud provide DR protection for applications running in the private cloud. Such an arrangement is ideal for enterprises with only a single datacentre and it can be used for all applications, whether they have HA protection or not. Because a SAN can be deployed in an enterprise datacentre, it is possible to use traditional failover clustering for HA protection. Given the high cost of SANs, however, many organisations are now choosing to use a SANless failover clustering solution instead.

One common configuration employs SANless failover clustering for both HA and DR protection, with HA in the private cloud and DR in the public cloud. This configuration is ideal for enterprises with only a single datacentre, and having a single HA/DR solution simplifies implementation and ongoing management. It is recommended that separate racks be used in the enterprise datacentre to provide additional resiliency, and that a remote region be specified in the public cloud to afford better protection against widespread disasters.

It is worth noting that both Microsoft and Amazon now have managed DR-as-a-Service (DRaaS) offerings: Azure Site Recovery and CloudEndure Disaster Recovery, respectively. These services support hybrid cloud configurations like the one in the example and are reasonably priced. But the arrangement often does not support replicating clustered applications and comes with some bandwidth limitations that preclude its use for many applications.

Other ways to optimise hybrid cloud price/performance

Here are some additional suggestions for managing resource utilisation in the cloud in ways that can lower costs while maintaining adequate service levels for all applications, including those that require mission-critical high uptime and throughput:

  • Right-size resource utilisation for optimal price/performance, paying particular attention to compute resources, which are the most expensive.
  • For existing applications, reduce allocations gradually while monitoring performance constantly until achieving diminishing returns.
  • For new applications, start with minimal VM configurations for compute, adding CPU cores, memory and/or I/O only as required to achieve satisfactory performance.
  • Storage is relatively inexpensive in the cloud, but be careful using “cheap” storage because I/O might incur a separate—and costly—charge with some services.
  • If available, make use of potentially more cost-effective performance-enhancing technologies like tiered storage, caching and/or in-memory databases to help optimise configurations.
  • Consider reducing software licensing costs by migrating applications from Windows Server to Linux, and from SQL Server’ Always On Availability Groups in the Enterprise Edition to Failover Cluster Instances in the Standard Edition—both of which are made possible by using SANless failover clustering.
  • Take advantage of any discounts available, such as pre-paying for services or lengthening service commitments.

Confidence in the cloud

The cloud’s resilient, carrier-class infrastructure makes it eminently capable of providing carrier-class HA/DR protection for enterprise applications. Using a SANless failover clustering solution adds carrier-class high availability, but without the carrier-class price tag. The ease of implementation and operation, combined with the cluster’s effective and efficient use of the cloud’s compute, storage and networking resources minimise ongoing costs and result in robust HA and DR protections being more affordable than ever before.

David Bermingham, Technical Evangelist, SIOS Technology