Skip to main content

Disaster recovery: Should I use array- or LAN-based replication for SRM?

VMware Site Recovery Manager (SRM) is a disaster recovery orchestration product which protects virtual machines by duplicating them in a secondary site, using either storage array or network-based replication.

The decision on which replication technology to use should be determined by business requirements, specifically the recovery point objectives (RPO) defined in the disaster recovery service level agreement. These requirements should be matched to the capabilities and scalability of the replication technology and balanced against costs and other considerations.

Array-based replication (ABR) uses a SRM storage adapter to leverage the replication and snapshot capabilities of the array. This allows for a high performance and synchronous or asynchronous replication of large amounts of data. Where an SLA demands a low RPO – a minimal loss of data - ABR remains the preferred option due its synchronous replication capability. However, this performance comes at a cost; storage infrastructure from the same vendor is required at both sites and in general features like array replication and snapshots incur additional licensing costs.

Alternatively, SRM can leverage the vSphere replication (VR) feature, incorporated into vSphere5.1, which utilises the hypervisor to replicate over the network on a per VM basis. This approach offers more flexibility as it allows replication between disparate storage and is storage protocol independent. So low-end, even direct attached, storage or cloud infrastructure can be used in the failover site to reduce costs. SRM with VR supports features such as failback and re-protect which were previously only available with ABR. Microsoft VSS can, additionally, be used to quiesce application data during replication passes to ensure data consistency. Also, multiple points in time recovery allows a rollback to a known consistent state.

A disadvantage of using VR is the comparatively lower performance - at best, a 15-minute RPO compared to the synchronous replication possible with ABR. Due to this limitation, VR is not suitable in situations requiring minimal data loss, for example, with a database tier. Instead VR would be suited for use with more static systems, such as a web application server tier. There is a performance impact on the host whilst running replication, as well as limits on the total number of replica VMs that can be supported (500 as opposed to 1,000 with ABR). Certain features such as linked-clones, physical mode RDMs and fault tolerance are not supported, but this may be addressed in the future.

It is possible to combine the two technologies, within supported limits and considering the resulting RPO, should it be desirable to do so. For example, small branch sites could use VR replication to a main site which is then protected using ABR replication. Or, certain VMs could be replicated, using VR, to a cloud provider as well as to an ABR linked site.

To summarise, VR is simpler, more flexible and cheaper than ABR but comes at the expense of reduced performance, scalability and feature support. The decision on which technology to use, or whether to combine the two, can be determined by business requirements as well as any existing investments.

Paul Grimwood is a technical consultant at GlassHouse Technologies.