How enterprise storage has coped with 30 years of data growth

Over the past three decades, the storage industry has changed significantly, not just in terms of evolving technologies, but also with regard to the volumes of data we produce and therefore the amount of storage we require. Today, we are accustomed to talking about terabytes and even exabytes when it comes to our data storage needs. But 30 years ago things were very different.

If we consider the capture of CCTV images in the 1980s, image quality was much lower than it is today. Surveillance technology, for example, created low-resolution images, which with the introduction of the VCR, were transitioning from real-time in-person monitoring to being recorded onto tape for playback and archive. The challenges were in tape capacity and the sequential nature of tape, which provided fairly slow access to specific sections of footage.

Read more: Tips for using storage to enable your business

Fast-forward three decades and we see very different challenges, not least because of the emergence of high-resolution imagery. Indeed, driven by the growth of smartphones, big data analytics and social media, there has been an explosion in the data storage industry. Market intelligence provider IDC recently showed that the amount of data used globally is doubling in size every two years, and by 2020 the digital universe – the data we create and copy annually – will reach 44 zettabytes, or 44 trillion gigabytes. Consumer and business demand for this information in real-time is only exacerbating the challenge presented.

Adapting to high resolution

For the past decade, television and computers have been surpassing one another in the area of display technology. The introduction of high definition (HD) took television to the next step: when transmitted at two megapixels per frame, HDTV provides about five times as many pixels as its standard-definition (SD) counterpart. But IT caught up very quickly and high-end computer displays have since far surpassed HD television sets. We are now expecting televisions to leapfrog computers again, with the introduction of 4K, offering a minimum display resolution of 3,840 x 2,160 pixels - four times that of HDTV.

So, what does this mean for the storage industry today? With the advent of HD technology, video content providers now have to process huge volumes of information at speeds approaching those of today's best computers. The growing pressure on content providers to rely on high bandwidth performance for high quality, high-frame per second transfers, means that data needs to be stored effectively and be easily retrieved. Whereas 30 years ago, strings of data were stored onto a disk at random, today content providers, and particularly those operating online, need to rapidly recall vast sets of high-quality data. The challenge is therefore accessing data, rather than storing it.

Adapting to the fast-moving Internet sphere

The fast development of an Internet-connected world and 24/7 ecommerce has brought a whole set of new storage requirements and challenges, including the analysis of big data, defined by Gartner as "high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization." With an explosion of information, governmental agencies are investing more in the necessary tools to manage such volumes of bits and bytes, targeting most of their investments on big data analytic capabilities. In the US, federal agencies spent about US $4.9 billion (£3 billion) on big data resources in 2012, according to IT consultancy Deltek.

But this has brought challenges on a technical level, too. While data centres commonly dealt with one type of data in the past, they now need to deal with both data types at the same time. This includes large-file data, such as images and video and smaller log-file data, generally captured from sensors. These two types of data are very distinctive in nature and require different storage systems, one designed for large-file sequential I/O and the other for small-file random I/O. While large-file data is normally accessed sequentially (sequential I/O), log-file data is captured from very small sensors (small-file random I/O), which can create billions of files accessed randomly.

The intelligent allocation of resources

Compared to 30 years ago, storage solutions today are flexible and able to freely accommodate a mix of drives types, such as SAS, SATA and SSD, to allow for the most effective allocation of storage resources. The introduction of intelligent software enables more efficient storage usage and allows arrays to cater to the specific needs of a video content provider by, for example, placing high-performance, real-time workflows alongside less demanding visual effects.

The explosion in drive capacities brings the issue of long rebuild times in the event of a disk failure within a RAID set. Rebuild times can be in the order of several hours, if not days, for large capacity drives which will leave the storage system in a downgraded, more vulnerable state while data sets are being rebuilt.

Disk drive performance has not kept pace with Moore's law and the massive improvement in CPU performance. This means that very often storage system performance becomes the bottleneck and this becomes even more compounded through the use of server virtualisation. To meet new storage performance demands, SSD technology has emerged to offer vastly improved data throughput and access latency but at a cost. The challenge now is how to use SSD efficiently and cost-effectively in conjunction with traditional HDDs

Read more: How enterprise storage has dropped the bling but kept the flash

On the other hand, since cloud-based storage services are priced according to capacity, technologies such as deduplication and compression are even more important to save costs and minimise the capacity being used. Data deduplication is a compression technique that eliminates duplicate copies of repeating data to improve storage utilisation. In the process, unique chunks of data, or byte patterns, are identified and stored during a process of analysis.

What differs from our storage requirements 30 years ago is that we now require the ability to store, interpret and retrieve large sets of data in a very short space of time. Thanks to advancements in software, modern arrays are able to do much more than just store data; they allow the most effective allocation of storage resources based on workload and data life cycle.

Warren Reid is the marketing director for EMEA at Dot Hill