In bicycle racing there is a phenomenon known as a “peloton,” (not to be confused with the high-tech spin bike) derived from the French term for “platoon.” It happens when riders form into a tightly massed group, reducing drag for those who are slipstreaming behind the leader. A primary benefit, of course, is the peloton helps individuals in the pack save energy -- which is important for riders who can later take advantage of that extra energy and sprint to the finish.
In today’s hyperscale datacentre and cloud storage, the idea of a peloton is emerging with similar benefits. In this case, it’s around managing multiple solid-state drive (SSD) resources as a single pool for better results and speeds. Standard operating systems manage each SSD as an individual unit, without awareness of neighbouring SSDs. Each SSD has an embedded flash translation layer that is concerned solely with that specific SSD. But managing multiple SSDs as one large pool means they can share the workload, avoid single points of failure and bring about other benefits.
Understanding this emerging trend, however, requires a quick look back at recent breakthroughs in storage for these groups, including the following:
- SSDS vs. HDDs
- In the last several years, many companies began using flash memory-based SSDs versus hard disk drives (HDD). Because they have no moving parts, are more energy efficient, have better speed for random read/write and store data into microchips, SSDs are much faster than read-and-write HDDs. Every SSD has a layer known as a flash translation layer (FTL) that manages internal flash memory, including address translation, wear levelling, bad-block management and garbage collection. Rapidly falling prices for flash memory are speeding the move to SSDs.
- Rise of DAS and Understanding its Limitations
- Several years ago, direct-attached SSDs (DAS) were the best and only options and were great for installing one or two SSDs per application server. The SATA SSDs offered a simple direct physical interface to the application CPU. Installing thousands of SSDs into application servers can be time consuming, but if there are problems, spotting the problem SSD is easy enough. DAS often works well in smaller installations. The greatest downside of DAS is it offers limited storage space and limited storage performance because each server can only be attached to a fixed number of SSDs (often just two). Adding more can lead to stranded capacity, imbalance and lower performance. Additionally, adding more SSDs and dealing with faulty SSDs require (expensive) data centre administrators and technicians and requires more physical space. Many large-scale installations do not bother dealing with faulty SSDs. As soon as one SSD in a server fails, the entire server is declared failed and put out of commission.
- NVMe in Storage
- More recently, NVMe got a nod as an emerging solution for storage because it is based on a faster and more scalable interface (based on PCIe). The NVMe protocol was designed to support a very efficient multi-core environment, meaning NVMe opens options for parallelism. In fact, this standardised storage interface is designed for high performance and low latency SSDs. NVMe-over Fabrics (NVMe-oF), extending NVMe over networks, is now generally regarded as the best way to scale storage and bring storage into the data centre and the cloud.
- Emerging NVMe Solutions
- Solutions built around NVMe are far better than prior solutions because they are built for SSDs rather than HDDs and they are imminently scalable. NVMe-oF was developed specifically for cloud-native applications to break the bond between storage and the servers previously hosting them, enabling servers to access SSDs over the network. However, NVMe-oF does not address the fundamental problem of SSDs being independent hardware entities, regardless of whether the application accessing them is local or remote. Whereas each SSD is managed individually by its local flash translation layer, using the pool of SSDs as a shared resource available to multiple applications servers requires a new innovative approach.
The need for disaggregation
The issues with running independent SSDs, even with NVMe, are exacerbated by quality of service issues. With multiple types of applications served by a single server in a DAS environment, quality of service is not a guarantee. In fact, the only way to gain that assurance is by having a layer between the front-end networking and the backend SSD, which allows for a complete understanding of the workload (what is the priority of the specific stream, how much bandwidth each client consumes, read vs. write and similar concerns).
This is the best argument for implementing a truly disaggregated system in which storage and compute can be separated and even scaled independently. Disaggregation’s benefits include maximising utilisation, reducing TCO, improving user experiences, supporting larger numbers of users, and easy adaptability to a cloud-native (containerised based) environment -- all while being easy to maintain and scale.
The problem of flash
Taking advantage of both flash and disaggregation is only achieved through efficient storage management layer - that “peloton” idea mentioned above. Storage solutions for data centres today often oversee and manage a huge pool of individual SSDs. That’s like each rider in a bike race running the course solo with no advantage from grouping riders.
The new idea is that rather than seeing each SSD as a discrete element, they can instead be exploited as a pool of elements. The benefits of this approach are easily understood: pooling eliminates the “single point of failure” problem; SSD hardware can become more resilient because individual SSDs do not have to work as hard; maintenance can be centralised; it’s easier to manage the I/O workload, data reduction and garbage collection; and scaling up or down is far simpler. These combined efficiencies can lead up to a 50 per cent reduction in TCO.
The challenge with this novel approach, however, is in the flash. That flash translation layer that resides within each SSD is meant for managing an individual SSD, not a group.
FTL’s evolution to global FTL
The thin client model of NVMe-oF is essentially an extension of the wire, so rather than the application accessing the SSD locally, the wire is extended to another server (storage server). While the range is extended, it is still either a “physical connection” within the same rack or between racks anywhere in the datacentre. Solving this problem involves allowing applications from across the data centre to use the same resources, which is possible by adding a new flash management layer.
There’s a new new flash management layer. Known as Global FTL (GFTL), optimises individual SSDs by implementing a global FTL across the pool of SSDs. It’s essentially the “hyper-peloton” for storage.
The GFTL manages multiple SSDs within multiple servers, turning them into one shared pool. This pool can be accessed via NVMe-oF and, in particular NVMe/TCP. This means users no longer need to worry about the physical location of the SSD, no matter where they are in the data centre – whether the SSDs are side by side, in the same rack, on another floor or even in a separate zone within the datacentre.
Better performance through GFTL
The GFTL approach in some ways defies the laws of physics. SSDs connected within a high-speed data centre with fibre relies on the speed of light. That speed, however, is finite, so the data stored locally by definition is faster to access and data stored farther away takes longer to access.
With GFTL, however, users can realise better performance and lower latency anywhere – despite the collection of drives being located anywhere across the data centre. The smart flash management of the GFTL helps eliminate latency spikes so the overall performance of the applications is often better than if the drives were still located within the same server. At the same time, the sys-admin enjoys the cost-saving and maintenance advantages of storage pooling and disaggregation.
Pooling SSDs provides data centres and cloud providers the benefits of scale previously reserved only for the largest hyperscalers. That’s because the combination of NVMe-over-Fabric protocol and GFTL splits and allocates resources to different applications, improving the performance per application by understanding how to deal with different workloads and optimising the pool of resources. In this manner the first application does not grab all the resources. For example, in cloud environments, multiple customers/applications can use the same resources; datacentre providers, however, want to serve their customers based on the service-level agreement (SLA). The GFTL layer works for both scenarios, allowing the system to virtualise all the SSDs as a shared pool to get the most out of them, gaining better performance and latency and a much better utilisation from the same media with an application level QoS.
The GFTL, combined with NVMe/TCP fabric technology, means the range of the storage capabilities is extended as well. This new paradigm for storage is like opening the entire bike racecourse to every rider to get to the finish line as quickly as they can rather than limiting riders to 100 or so miles each day of the race. And with pooled resources, the racers in this case would pull energy from each other so they could all win the race. Rather than taking the power of each individual SSD, the combined power of them all working in concert and magnifies the technology.
Putting the power of the SSDs together, separating read and write, and creating different priority-based pipelines within the system means users can gain the best service, quality and speeds for each application.
Eran Kirzner, CEO, Lightbits Labs