Skip to main content

IBM flexes its storage muscles with 120PB beast

IBM is putting the finishing touches on a massive data storage facility comstructed from 200,000 hard drives and boasting a capacity of around 120 petabytes.

A petabyte is around a thousand terabytes, which in turn is around a thousand gigabytes - a unit of measurement which you're likely to be more familiar with - and equates to around 24 billion MP3 files.

That's not the use that IBM has in mind for the device, of course. According to coverage of the company's work on MIT's Technology Review, the massive data store has been developed by IBM's research arm for a customer who needs a way to store the data required for a supercomputer to accurately simulate real-world phenomena.

While most people are aware that supercomputers need stacks of processing power to churn through their work, storage is often overlooked. The amount of data in an average supercomputer-size simulation is, however, staggering, and grows still larger when the system needs to find somewhere to store its output.

The result: IBM's 120PB monster storage unit, the largest single logical storage device in the world. It's so big, that the file allocation table - which keeps tracks of where files can be physically found within the file system, along with their names and attributes - takes up a massive two petabytes of that storage for itself.

Although the unnamed device is constructed from traditional hard disks - 200,000 of them, in fact - IBM's put a lot of thought into the technology behind it. The drives are mounted in wider than average racks, with a complex watercooling system keeping each at an optimum temperature and minimising failures.

That's not to say that failures won't happen: with that many disks, it's inevitable that some will fail. As a result, many of the drives don't contribute to the overall storage capacity but instead offer redundancy, while new techniques developed at IBM Amalden's storage research division helps to keep the system chugging along while it waits for a drive to swapped out.

Full details of the hardware used in the system have not yet been released, but the company has confirmed that it is using a tweaked version of its General Parallel File System - GPFS - which already offers impressive clustering features such as efficient indexing of directory entries for particularly long lists of files, distributed locking, and distribution of the metadata including the directory tree.

The other information IBM isn't divulging is the price - but expect the company's unnamed customer to be paying through the nose to benefit from a storage pool several times larger than anything else on the planet.