Skip to main content

Uncomplicating high availability for SAP

communication technology
(Image credit: Image source: Shutterstock/violetkaipa)

From an IT perspective, SAP landscapes can be extremely complicated, involving multiple computing clusters tuned to support different core components in distinct ways. All the servers, all the clusters, and all the components need to interact predictably and reliably — yet many represent single points of failure in waiting. Unplanned SAP system downtime is rarely an option, so it’s extremely important that an IT team have the means to ensure the availability of all the disparate components informing this landscape.

Given the complexity of this infrastructure, that can seem like a daunting challenge — but it need not be. A range of solutions — from high availability extensions (HAEs) to SAP-specific application recovery kits (ARKs) to SANless cluster solutions for storage and server failover management can be deployed on-prem or in the cloud to ensure the high availability of an SAP landscape.

Let’s start with definitions. “High availability” (HA) is not an arbitrary term. It’s accepted to mean that something — a system or an application — will be available 99.99 percent of the time. What’s important to clarify, though, is what that something is. Many cloud SLAs, for example, guarantee only that at least one virtual machine (VM) in a cluster configured for HA will be available 99.99 percent of the time – but if you have not configured your cluster so that every VM in the cluster is able to run your application on its own, then in the midst of serious natural disaster you may discover that you don’t have access to your key applications and data even though you may be able to access a VM.

Since we’re talking about SAP, clearly you want to ensure HA at the application layer. Achieving that can be accomplished in a variety of ways. One approach, what one might call the “big hammer” approach, involves preparing for an event of such magnitude that takes the entire data center offline — a flood, for example, or a tornado. In the face of such a natural disaster, one could ensure the availability of the SAP landscape by completely failing over from the original infrastructure to a mirror configuration in a geographically distinct data center, one that is not affected by the flood or tornado.

Such events do happen, so it’s important to have a big hammer solution in your toolkit. But such events don’t happen often, and there are innumerable lesser events — network and hardware faults, application bugs, improperly patched code, and more — that can unexpectedly render some or all or your SAP landscape temporarily unavailable. Pulling out the big hammer and performing a full failover to a remote site would certainly ensure ongoing availability, but it’s a bit heavy-handed. There are other approaches, involving smaller hammers, if you will, that can resolve the lesser problems and deliver application availability more efficiently and with less disruption.

In short, ensuring HA of your SAP landscape is best achieved by a combination of tools.

Smaller hammers for everyday availability threats

For the day-to-day availability threats, HAEs — SUSE HAE is a well-known example — offer one approach to ensuring HA. HAEs are often open source options or solutions that are more or less bolted on to existing deployments. They’re built for whatever flavor of UNIX or Linux you are using (or built for the hardware on which you’re running), and they can watch for a variety of low-level OS and hardware events that might predictably lead to problems that will affect your SAP landscape. HAEs can then programmatically intervene before those problems actually occur. By doing so, they can stave off the problem that might have led to some element of your SAP landscape becoming unavailable.

The downside of HAEs are twofold: They require a high degree of customization and scripting — not just to work but also to be stable. More at issue in an SAP environment is the fact that HAEs are application-agnostic at best, application-unaware at worst. That can lead to problems in an SAP environment because an OS or hardware-oriented HAE may not operate with any awareness of the unique characteristics of your SAP landscape. It will not know that certain elements within the landscape need to be loaded in advance of others. It will not be aware of any of the interdependencies that enable your ERP system to operate properly and, consequently, could respond automatically to events in ways that are actually antithetical to your goal of ensuring SAP availability.

An alternative to approach to ensuring HA against day-to-day threats is to deploy a third-party HA monitor that is application-aware. These are also known as application recovery kits (ARKs) and designed with specific software deployments in mind. Like an HAE, an ARK designed for SAP central services or for SAP S/4HANA will monitor the health of infrastructure, but unlike an HAE it will do so with much greater awareness of the interdependencies and unique configuration characteristics of an SAP landscape. ARKs will watch for issues that may be particular to an application environment that a generic HAE would not notice; simultaneously, an ARK may disregarding issues that are not anomalous within the world of SAP central services or SAP S/4HANA but that a generic HAE might detect and try to “fix.” Finally, if an ARK designed for SAP detects a fault that warrants a programmatic response — the restarting of one or more processes, for example — its application awareness will ensure that the processes are restarted in the proper order, which a generic HAE will not necessarily do.

Big hammers for the exceptional threats

That still leaves the matter of the catastrophic events that require those big hammers we mentioned earlier. If a critical server in your SAP landscape goes offline — whether it’s an application server, a database server, or some other key server — access to your SAP application is going to be compromised. The precipitating event need not be a natural disaster; it could be human error, or a network or electrical failure in one area of a data center. Regardless of the source of the outage, if you want to ensure the HA of your SAP application, you need mechanism for failing over some or all of your SAP landscape to backup infrastructure that can take over when needed.

Here’s where a tool enabling HA clustering becomes critical. An HA clustering tool enables you to create a logical cluster of systems that can act together to support your application needs. The cluster nodes can reside in geographically distinct locations, so an event affecting one location (whether that be a tornado or an electrical fault) will be unlikely to affect cluster nodes in other locations. The HA aspect of the clustering tool ensures an orderly failover to a standby server — within the 0.01 percent downtime allowed within the definition of HA — in the event that the active server in the cluster goes offline.

As noted earlier, though, for 99.99 percent availability of the SAP application, you need to ensure that all the critical components of your SAP landscape are available to the failover cluster node that is suddenly called into service. Given configuration best practices for SAP that call for running certain services — ERS and ASCS, for example — on separate servers, your HA goals will be furthered if you use an HA cluster management system that works closely with SAP-aware ARKs. That added application awareness can ensure the proper separation of SAP services — to the extent possible — even in a failover scenario in which numerous services may need to be moved to standby cluster nodes.

Don’t forget the data

One more thing: In addition to the importance of integrating SAP application awareness and the HA failover management system, key to HA in an SAP landscape — particularly one that is running in the cloud — is ongoing access to the underlying SAP application data. And there lies one more. If the HA cluster you intend for your SAP landscape is built on-prem, you may be looking at using a storage area network (SAN) as your data repository. But in an HA scenario a SAN poses two distinct problems. In SAP landscape build on-prem, a SAN constitutes a single point of failure that could render the entire landscape unavailable if the SAN were to go offline. If your SAP landscape is built in the cloud, the problem is even starker: A SAN cannot be configured in the cloud.

A solution to the SAN problem that ensures ongoing access to your SAP application data from any node in an on-prem or cloud-based cluster lies in the use of SANless clustering software. SANless clustering software continuously performs high-speed, block-level data replication between storage attached directly to the active cluster node and storage attached to each secondary node in an HA cluster. Should the active node go offline, for any reason, the HA cluster management system can immediately failover to one of the backup servers, where, because of the replication features of the SANless clustering software, all the SAP application logic and data is ready and waiting to be called into service.

This being an SAP landscape, all this sounds complicated — but in truth it’s less so than it sounds. HA monitoring tools such as ARKS can help you keep everyday problems from compromising application access. HA clustering tools that work closely with SAP ARKS can ensure an orderly failover between cluster nodes in the event that active infrastructure suddenly goes offline. And failover can be rapid enough that your expectations for HA are met, particularly if you have ensured that all the data required by your SAP environment is always available to all the nodes in your HA cluster through use of SANless clustering tools.

Ian Allton, Solutions Architect, SIOS Technology (opens in new tab)

Ian Allton, Solutions Architect at SIOS Technology, has spent 20+ years assisting enterprise IT teams supporting, implementing, and servicing complex storage, NAS, virtual, and clustering systems running on Linux platforms.