High availability can be costly. It requires deploying or allocating standby resources that are rarely used. Data is constantly being replicated, which consumes precious bandwidth. Configurations should be tested under all possible failure scenarios. For SQL Server running on Windows Server, the more robust protection provided by Always On Availability Groups requires licensing the more expensive Enterprise Edition. System and database administrators being pressured to reduce costs could potentially use some help.
This article highlights various ways to reduce ongoing costs without sacrificing high availability (HA) and disaster recovery (DR) protections for SQL Server databases running in the Amazon Web Services (AWS) cloud.
SQL Server offers two commonly used options for HA and DR protections: Failover Cluster Instances (FCIs) and Always On Availability Groups. FCIs have two major advantages: Inclusion in the less expensive Standard Edition and seamless protection for the entire SQL Server instance. A major disadvantage is the dependency FCIs have on Windows Server Failover Clustering (WSFC), which requires shared storage, such as a storage area network (SAN), as a means to share data between the active and standby instances. The problem is: AWS and other clouds all use local and not shared storage.
The Datacentre Edition of Windows Server 2016 addressed the lack of shared storage in the cloud with Storage Spaces Direct (S2D), a new feature that also received concurrent support in SQL Server 2016. S2D is software-defined storage capable of creating a virtual SAN that satisfies WSFC’s need for shared storage, including with support for SMB3 file shares. But S2D requires that the servers be deployed within a single datacentre, making it incompatible with the AWS Availability Zones and Regions normally used to provide HA and DR protections, respectively.
The other SQL Server option is Always On Availability Groups, which requires licensing the more expensive Enterprise Edition. The high cost can be justified for some needs, such as very large databases and those requiring readable secondaries. But the increase over the Standard Edition can be difficult to justify purely for HA/DR purposes for many, if not most database applications.
It is worth noting that SQL Server Standard Edition also offers a Basic Availability Groups feature, but it supports only a single database per Availability Group, making it suitable for only the smallest of environments.
Using limited or application-specific features like Always On Availability Groups has another disadvantage: It creates a need for deploying different HA and/or DR solutions for different applications. And having multiple HA/DR solutions leads to an inevitable increase in complexity and costs for licensing, training, implementation and ongoing operations.
Consolidating HA and DR protections in a SANless failover cluster
These and other challenges have long been overcome by general-purpose failover clustering solutions purpose-built for providing HA and DR protections in private, public and hybrid cloud environments. These solutions are implemented entirely in software that creates failover clusters of physical or virtual servers and storage—sans SANs—to assure high availability for virtually all applications.
Versions for Windows Server normally work seamlessly with WSFC by providing real-time block-level data replication both on-premises and in a cloud-based SANless environment. In addition to being able to work with FCIs, these solutions usually overcome another limitation in the Standard Edition of SQL Server: support for only a single standby FCI node. The ability to have a two-node cluster spanning AWS Availability Zones, along with a third instance in a different Region, as shown in the example below, consolidates mission-critical HA and DR protections in a single, cost-effective configuration.
Versions for Linux, which lacks a capability equivalent to WSFC, must provide a complete solution that includes data replication, continuous application-level monitoring and configurable failover/failback recovery policies. Using Linux for SQL Server and other applications affords considerable savings, and third-party SANless failover clustering solutions now make configuring HA/DR protections nearly as easy as it is for Windows Server.
In addition to enabling administrators to have a single, easy-to-mange, application-agnostic HA/DR solution (albeit with different versions for Windows Server and Linux), most third-party failover clustering solutions also offer a variety of other cost-cutting capabilities. Examples include having minimalist “warm” standby configurations, using data compression and other forms of WAN optimisation to reduce bandwidth utilisation, and enabling manual switchover of active and standby instances to simplify performing planned maintenance and routine backups.
The ability to “undersize” standby instances can deliver considerable cost savings. Because standby instances operating in their standby mode do not actually run production workloads, they can be configured with minimal resources (CPU, memory and network bandwidth) at a minimal ongoing cost. The trade-off is the extra step needed to “upsize” and reboot the instance during a failover, which slightly increases the recovery time. There are other factors to consider, as well, such as the potential for I/O and storage limitations in smaller instance types that may preclude their use in some situations. But when viable, the cost savings can be significant.
Additional savings can be achieved by compressing the data that transverses the WAN, particularly in a hybrid cloud environment. Because higher levels of compression require more CPU capacity, some tweaking is usually needed to achieve the desired balance.
One of the most cost-effective ways to consolidate HA and DR protections is to use a single Virtual Private Cloud (VPC) that distributes three SQL Server instances across multiple AWS Availability Zones and Regions. The configuration consists of a two-node HA failover cluster spanning two Availability Zones, along with a third instance deployed in another Region to facilitate full recoveries from widespread disasters. For the HA failover cluster, the data replication is synchronous, enabling rapid automatic failovers. For the DR instance in the separate Region, the data replication is asynchronous to avoid adversely impacting on throughput performance, and failovers employ manual processes to minimise the potential for data loss.
A similar configuration is also possible in a hybrid cloud environment. For example, the two-node HA failover cluster could be deployed in the AWS cloud with the third instance for DR running in a minimalist virtual machine in an enterprise datacentre—or vice versa.
Cutting costs not corners
- Amazon Web Services - Why our "unprecedented" growth can be the key to your cloud success
The state-of-the-art global AWS infrastructure is eminently capable of providing carrier-class HA/DR protections for SQL Server databases. But implementing carrier-class protections need not require paying a carrier-like high cost when they are consolidated on a purpose-built failover clustering solution. By being easy to implement and operate, while also making effective and efficient use of all AWS compute, storage and networking resources, SANless failover clustering software minimises ongoing costs, resulting in robust HA and DR protections now being more affordable for more applications than ever before.
David Bermingham, Technical Evangelist, SIOS Technology