According to IDC, the next few years will see an explosion of data into 100s of Zettabytes with 80 percent of that growth coming from unstructured data. Object storage was specifically designed to handle the massive scale of unstructured data created by the digital economy. Powered by innovations leveraging software-defined principles and flash storage media, the transition to high-performance object storage is now accelerating in the unstructured data market driven by organizations looking to achieve specific performance metrics or to future proof their next generation data centers.
Back to basics: what is object storage?
File, object and block are the three main types of data storage. Most of us are familiar with file systems and we use them every day to store files on our laptops. File systems provide a hierarchical, directory-like view of unstructured data stored as files. This creates an easy way to organize data by topics or categories, and provides semantics in the data through file names, types and additional attributes. However, the overhead of maintaining this organizational structure limits scalability when it comes to the number of files, directories and even efficiency when locating or searching data. Just think of how much time you have spent combing through your own directories to find a specific file you need for other purposes. Ultimately, file systems work but they can crumble under their own weight as the numbers scale, which is clearly the case today.
In contrast, object storage is designed for very large scale storage requirements, for example 100s of millions to many billions of files such as seen today in enterprise and cloud applications. Object storage simplifies data access at scale with the following two key characteristics:
- a flat (non-hierarchical) namespace “view” that can grow without bounds, eliminating the scaling issues of file systems;
- a simple key based access mechanism, where each object has a unique identifier (the “key”) that can be used to locate and access (read/write) the data stored in the object.
This all sounds very straightforward, but in its early days, a main barrier to widespread adoption of object storage in applications was the lack of a standard protocol like we have had for years for file systems (NFS, SMB). This has largely been eliminated through the adoption of the AWS S3 API as a defacto standard for use in applications, storage systems and cloud services for access to storage services.
Last but not least, block storage lies underneath all file systems and object storage. These are the fundamental fixed-size data blocks that are stored on physical disk drives. Block storage is still used directly today by some applications, such as databases, and is still the access protocol that is exposed on storage area networks (SANs), which is the standard network model for block storage. For users and applications, the downside is that since every block on the storage system is effectively the same other than its binary contents, they do not provide useful data “meaning” as we get with file systems or object storage. However, with modern storage virtualization and SAN solutions, they have become somewhat easier to manage.
Growth of use cases
Previously associated mainly with archival use cases for large volumes of unstructured data, object storage addresses a much broader range of use cases and application performance requirements today. From conversations with end users, these are the use cases that can most benefit from high-performance object storage:
- Media content delivery - online content delivery or streaming of recorded content such as from Cloud DVRs. Delivery of live TV is more demanding, requiring stable latency, without any spikes that would cause video glitches or delays.
- Data analytics - use cases that will demand higher performance include those that analyze vast amounts of unstructured or semi-structured data, for example in financial fraud detection, travel services, healthcare for pattern detection and many more.
- IoT/edge data - data from edge-based sensors, meters, devices, cameras and vehicles, such as video data and logs.
Performance is not just performance
Before looking at how object storage is speeding up to meet the demands of these use cases, the metrics of performance must be considered. Performance is more than a simple race to the finishing line. When it comes to high-performance storage, two metrics have typically been used: IOPS (the number of input/output (IO) operations per second a system can deliver) and latency (the time to access the data, typically measured in milliseconds). For structured data such as relational databases, these metrics are key. Block storage in the form of SANs and high-end NAS systems are optimized for these metrics, and are therefore very well suited for transactional database-like use cases that depend on these attributes.
A third metric should also be taken into account when measuring performance for massive volumes of unstructured data: throughput, which is a measure of how fast content can be delivered from storage to the application. Since many types of unstructured data are large files (images and video files can be multiple megabytes to gigabytes and even larger), the key performance metric is how fast they can read or write files in terms of megabytes or gigabytes per second). For many new applications, fast object storage will also need to also focus on high throughput delivery of single and multiple files simultaneously.
How is object storage boosting performance?
For years, object storage suppliers have leveraged the low latency and high IOPS of flash media that provides but mainly for object metadata - not data - acceleration. For example, lookups of object keys and for operations such as modifying metadata attributes and listings, are optimized today by accessing object metadata stored on fast flash devices. This is used effectively to shield the higher-capacity spinning (HDD) disks until the actual data payload is required.
Since flash media has offered lower density and higher cost than spinning disks, this has been the best way to deliver an optimal blend of price vs. performance, especially for very large data capacities where customers need to keep costs down. More recently, the reduction in the cost of flash storage and new high-density flash such as Quad Level Cell (QLC) media is changing the game in terms of flash economics. This will enable object storage vendors to enhance performance for the use cases described above while keeping the cost down.
Here are a few current real-world examples of high-performance object storage deployments:
- A large service provider deploys a cloud webmail service on over 100 high-density storage servers, with nearly 5000 disk drives with software defined object storage. The system today stores over 230 billion objects, and is sized for a peak load of 1.6M IOPS, and delivers a sustained load of 800,000 IOPS all day.
- A travel service provider stores 1 Petabyte per day of logs, and maintains this for 14 days as a rolling window. High-performance object storage platform provides 20GB per second of sustained write throughput, with peaks of 60GB per second (while also deleting the oldest data).
- A European telecommunications and content delivery provider delivers live TV from content stored on object storage with very low (< 5ms) latency access guaranteed to eliminate any video jitter or glitches.
High-performance object storage has now emerged as a predominant way in which the enterprise IT market is transforming along with the digital economy to unleash the power of the vast volumes of unstructured data organizations are storing.
Paul Speciale, Chief Product Officer, Scality