Unpredictable data growth and equally hard-to-estimate performance demands from users mean that datacentre operators are increasingly turning to converged infrastructure to provide them with the flexibility they now need. If converged infrastructure is to achieve its full potential, however, careful attention must be given to the way it is powered – traditional power architectures are actually rarely the best solution. We interviewed Tatu Valjakka, the software and connectivity product manager of the power quality division for EMEA at Eaton, to find out more.
What is converged infrastructure?
Converged infrastructure is essentially a modular approach to datacentre design and construction. The basic building block of converged infrastructure most usually comprises CPU and memory, storage, network connection devices and, crucially for today's systems, provision for virtualisation. Physically, this building block can be realised in many different ways. It could, for example, be a bundled pod, a pre-populated rack or even a container. Whichever option is chosen, decisions have to be made about providing power for the modules and, as always, these decisions will be influenced by the size and power consumption of the installation, and by availability and resilience requirements.
How does convergence affect the way resilience is achieved?
Traditionally, resilience has been considered to relate primarily to the hardware and operating system, and every effort has been made to make these as reliable as possible so that they provide a strong, stable foundation for the application software. In a system that uses this philosophy, the power infrastructure will be configured to support maximum hardware availability and will incorporate, among other things, a high level of redundancy. This is a perfectly valid approach, and for users who must have guaranteed availability without even short-term performance degradation, it remains the best approach.
The problem is that a system that relies heavily on the hardware layer for resilience is static and may be hard to modify. It's also likely to be expensive. Fortunately, converged infrastructure opens up other possibilities for achieving resilience. One of the most attractive is to build this into the cloud or virtualisation layer by adopting a cluster approach that's prepared to deal with failures in the lower layers. It can do this, for example, by moving virtual machines to hardware that it is unaffected by the failure, by restarting virtual machines or even by using public cloud services as a back-up site.
If resilience isn't hardware-based, is power management still required?
There's a very dangerous delusion currently in circulation, that if the software layers of an IT system can handle failures in the physical layer, power protection and power management become optional or even completely unnecessary. In reality, nothing could be further from the truth. Power protection is essential in all systems. Properly implemented, it will filter out transients and other fluctuations, providing the IT hardware with invaluable protection against damage. It can also protect against another potentially serious problem – zombie servers. These are machines that work erratically but haven't failed completely. A power management system can provide 'fence' mechanisms that turn the zombies off.
However, the really crucial reason for providing power management and power protection in systems where resilience is provided at levels above the hardware layer is that for higher-level resilience strategies to work, the levels providing the resilience must always be power aware. Consider, for example, a resilience strategy that involves migrating virtual servers to remote hardware if the mains supply fails. Without power management, how can the virtualisation manager be aware of the mains failure and know that it should initiate migration?
What options for achieving resilience does good power management create?
We've already mentioned the first option, where the power management system initiates the transfer of applications from a site or server facing imminent power disruption to another server or site where the power is good. With this option, the full service provided by the IT system is available at all times, although users are likely to experience a short-term slow down while the transfer of applications is taking place.
Another option for achieving resilience in the event of power problems is to suspend non-critical machines and migrate the critical ones to a back-up site or server. This is often acceptable in manufacturing applications where power loss means that the production plant is shut down, so there is little point in maintaining the IT services directly associated with it.
The third option doesn't involve the transfer of virtual machines. Instead, when there's a mains power problem, non-critical virtual machines are shut down, and the remaining essential virtual machines are consolidated on a small number of physical servers on the same site. The system can then continue to operate on standby power until the batteries are almost fully discharged, when a graceful shutdown can be executed. This option allows long runtimes to be achieved for essential services with a UPS installation of only modest size.
How is power protection implemented in converged systems?
Three basic approaches are possible: centralised, end-of-row and rack- or container-based provision. The centralised approach, where a single UPS powers the whole site, is tried and tested, and it is worth noting that when higher layer resilience is in place, it is often possible to specify a UPS with a shorter runtime than would otherwise be needed. When adopting a centralised approach, it is usual to plan from the outset for maximum capacity, but to add power modules only as the load grows.
End-of-row power protection systems can have a single or double power bus, and can be modular or provide full capacity from the outset. They allow redundancy schemes to be implemented and they provide links to virtualisation, but they have no special advantages or disadvantages in converged applications.
Rack/container-based systems are, however, particularly well-suited for use in converged applications. With this implementation, power protection is an integral part of each basic unit – typically each computing pod comes with its own power system. This means that the power system has the same modularity as the IT equipment, and it therefore scales automatically. It is also readily possible to provide 1+1 redundancy with dual power supplies in the servers.
How can power efficiency be optimised with fluctuating loads?
A characteristic of most converged and virtualised systems is fluctuating application load, particularly when strategies like 'following the moon' are adopted. The changes in application load are reflected by similar changes in power demand. Unless the power protection system has been designed with this in mind, the result is likely to be poor energy efficiency because UPS systems deliver optimum efficiency only when they are running at high load levels.
The solution is to specify a power system made up of several UPSs which share the load, and which are complemented by intelligent multi-UPS management. The UPS control system ensures that, at any given instant, only those UPSs needed to meet the current power demand are operational; the other modules are held in a standby state where they consume almost no power, but they can be brought back on load in just a few milliseconds when demand increases.
This arrangement means that the load is always concentrated on the minimum number of UPSs needed to meet it. As a consequence, these UPSs are well loaded and, therefore, operating efficiently. It is, in fact, possible to take the modular concept further by specifying UPSs that are themselves made up of modules that can be instantly transitioned between standby and operational mode. This arrangement, which is usually called variable module management (VMM), allows the UPS capacity to be accurately matched to the power demand over a very wide range of loadings, ensuring that high efficiency is achieved under all conditions, irrespective of load fluctuations.