Skip to main content

The marriage of tape and cloud: achieving cost-effective data preservation with the hybrid cloud approach

(Image credit: Image source: Shutterstock/Nattapol Sritongcom)

At a time when data preservation is a priority for many businesses, the necessity to obtain the most reliable and secure way to safeguard data has never been greater. In determining their long-term storage strategy, organisations must evaluate the reliability, security, performance and costs of multiple data storage solutions, including cloud and tape storage. It may come as a surprise that, despite the hype around public cloud, more recent incarnations of tape are proving as viable, if not more so, in many cases, as the public cloud.

In addition, organisations are finding that the blend of both tape and cloud makes for an optimal cost-effective solution that fulfils the above criteria and protects data in the long-term. However, embracing this combined approach first requires the unravelling of some data storage myths.

Fact vs myth: perceptions of tape and cloud

With all the cloud buzz and hype, the overriding impression is that cloud has almost entirely taken over the storage market. Observing cloud marketing, one could be forgiven for regarding tape as merely a relic of the past. This thinking could not be further from the truth.

Tape is still the prevalent and trusted choice of many of the world’s most well-established and data-hungry organisations across a variety of industries including healthcare, media and entertainment and high-performance computing. These businesses heavily rely upon tape for digital preservation and archival usage for their long-term needs. Behind the scenes, tape has made some tremendous technological advances, such as dramatic performance improvements, higher capacities, faster retrieval times and outstanding efficiencies that rival today's most efficacious data storage solutions.

Tape technology has been continuously increasing in density every few years, as attested by the 2019 and 2017 Tape Storage Council memos.  In the last 10 years, LTO Tapes have increased capacity by 1,400 per cent, performance by 200 per cent, and reliability by 9,900 per cent. Tape had four major improvements in 2017 alone, including a new generation of Tape drive and media. The native capacity of LTO-7 media doubled with the next generation LTO-8, and improved throughput by 20 per cent. The LTO roadmap projects that native capacities of LTO drives will approximately double with every subsequent generation. Consequently, the compressed capacities are anticipated to increase from 60 TB with LTO-9, 120 TB with LTO-10, 240 TB with LTO-11, and 480 TB with LTO-12.

Even with a ‘cloud first’ mandate, organisations are finding that a combination of cloud and tape for archiving is an extremely effective method of ensuring reliability, security, efficient workflows and cost savings. As many enterprises that jumped headfirst to the cloud have later discovered, the ‘one-size fits all’ approach adopted by many cloud providers is not necessarily a good fit for their long-term business requirements, both economically and practically.

Another misconception is that the price of tape is greater than cloud. In reality, tape innovation has significantly lowered its cost, increased capacity and further improved speed and reliability.

So in weighing the key criteria to select a long-term data storage solution, how does tape compare to cloud? 

Reliability

Tape is a highly reliable medium and with some annual maintenance, tape libraries have an almost infinite lifespan. As data stored on a tape cartridge can also be continuously migrated to newer generations. The cartridge life expectancy of magnetic tape is an impressive 30 years.

For cloud to achieve the remarkable reliability and uptime figures of between nine and eleven 9’s (as extolled in much of its marketing), it requires the elaborate tasks of data replication to multiple cloud storage sites. In addition, it requires the placing of multiple copies within each site, all while paying a per-GB cost per month for total capacity, meaning two copies doubles the storage capacity that will be billed each month. The alternative, storing data at only one cloud storage site, sees many recorded individual failures and does not make sound business sense.

In addition, while most cloud providers have Service Level Agreements (SLAs) in place, which outline the commitment from the cloud host, the SLA almost never guarantees that the data will be retrieved within a given timeframe. In many cases there is no certainty in getting the data back at all. Surprisingly, cloud providers are not responsible for lost, corrupted, or encrypted data that cannot be accessed. Should data not be accessible due to downtime during a cloud migration project, it could have far-reaching implications on the success of that migration.  Local storage on tape will always be the fastest and most secure way to retrieve large amounts of data.

Security

Due to the basic set up of most networks, which do not protect data with encryption-enabled security software, data and security breaches have become a common occurrence. According to MIT Technology Review, “Cybercriminals will target SaaS and cloud computing businesses which store and secure private data.” Cloud storage providers reassure customers that all data is securely encrypted, but with providers regularly changing their security software, updates can have unknown security vulnerabilities. These susceptibilities allow hackers to hit multiple accounts, or even multiple services, simultaneously.

Ransomware (where downloaded malware encrypts the data so the user is unable to access it without paying exorbitant costs for a private key) has been on the increase, leaving businesses vulnerable. And while the malware can be easily removed from the system without the encryption key, organisations then have no way to get their data back. According to a Global Ransomware Marketplace Report for Q4 2018, on average, only 86 per cent of data is recoverable after running decryption (the tool supplied after paying the ransom).

Some cloud providers offer to store accessible copies of data in the cloud on a per-month basis, which could be a way to tackle the ransomware risk, but customers have to pay an extra cost for the privilege. In addition, restoring data that has been previously archived may result in it landing on a more expensively priced, “infrequent access” storage tier, where the data must remain for 30 days, or the customer incurs an “early deletion fee”.

The only real way to keep data completely protected from ransomware is by storing a copy of data on tape cartridges behind an “air gap”. This air gap is created by the media’s removal from the drive, thereby disconnecting it from any network at all. Without that connection, it is no longer exposed to a network attack. Because tape cartridges do not need power or connection to a network, Tape is currently the only storage medium that can deliver this assurance.

Speed: data storage and retrieval

As the transfer process of backing up and archiving enormous quantities of data or project files is a significant undertaking, finding the fastest and most reliable medium is vital.

The reputation tape had about 40 years ago of being “slow” has long since changed. Today, Tape can transfer data at extremely fast rates, even outpacing disk, and is highly scalable. Enhancing performance can be achieved through the addition of tape library upgrades and expansion frames. To give an example of tape’s transfer speed, a modest 24-drive installation can write 60TB of data in an hour, 1PB in less than a day, and 10PB in a week.

Cloud storage is perceived as being fast and agile, and this is true to varying degrees, with limited control and flexibility as to how data can be downloaded. The bandwidth of cloud can be scaled to accommodate larger file transfers, but archiving speed is generally based on an organisation’s internet bandwidth. To compare, if a business archived the same 60TB to the cloud instead of tape with a 1Gbps bandwidth connection, this transfer would take a little over six days. Utilising the typical quarter of their bandwidth that most businesses allow for cloud uploads with a dedicated 250Mbps connection, archiving 60TB would take 25 days, almost a whole month to place that project into the cloud!  So while organisations today have found that getting data to the Cloud is “fast enough”, it is the restore and recall of data that is a lengthy process. 

Data retrieval

Though improvements have been made in data retrieval, all cloud providers have differing protocols when it comes to data retrieval from the cloud, which can be complex.

The “recovery time objective” (RTO), is essentially the maximum amount of time an organisation has to recover its data before it suffers loss in business operations. If the company can afford to go down for half a day but no more, then cloud would not be a viable option as its data would not be available to the business during that time frame. Organisations must take steps to ensure they are matching the storage target with the time to recover, to ensure operational efficiencies. 

To aid in cloud data retrieval, developers have created third-party application GUIs, but there are no guarantees these will work, making troubleshooting difficult for cloud service providers.

In contrast to a data retrieval time of seconds from the frequently accessed tier of cloud, for some providers, retrieval requests from the archive storage tier can take between three to five hours to source a single file. In a disaster recovery scenario, this length of retrieval time is simply not viable. And as the speed of recovery usually depends on the amount of data to be retrieved from the archived tier, tape will always be faster than cloud, as the data on tape would be readily available. And although retrieving smaller files from the more frequently accessed tier could be marginally faster than tape (depending on the bandwidth), customers would be paying ten times the storage cost.

Redundancy

As failures can occur on any storage medium, affecting a single file to multiple terabytes, redundancy must always be factored in when considering data growth. Having multiple copies is vital, not only for disaster recovery, but also for any type of data storage failure. Redundancy with tape storage is straightforward: a set of files is written to two separate tapes.

When it comes to cloud, redundancy can be as simple as placing two copies in the cloud. The user chooses what regions to store data in, and where there is more than one location, the cost is multiplied by the number of copies. So placing two copies in the cloud can impact pricing tiers, with twice the volume of data potentially doubling the cost. Due to these concerns some users are storing one copy in the cloud and another on another storage media such as tape or disk on-premises. That way, should disaster strike in the cloud, a copy can always be retrieved from on-premises, which saves time and money over retrieving data from the cloud.

Expected future costs

One of the most important factors when deciding on a storage option is the total cost of ownership (TCO). Tape’s TCO is fairly predictable, with the greatest expense being the initial tape library. Media migration need only occur after lengthy periods of time, and critical data can be transferred to new generations quickly. Innovative solutions can automate the tape migration process so that it takes place in the background with minimum user interference.

Cloud’s TCO has not changed much over the last several years. The anticipated 50 per cent drop in cloud prices (due to increased competition and more efficient processing and storing techniques used by large cloud providers) did happen, but with a twist: the cost to access the data is now twice that of the previous tier. Amazon Web Services recently announced a Deep Archive tier that allows organisations to store data for a staggering $0.001 per GB per month or $1.01 per TB per month compared to Amazon's Glacier tier that costs $0.0046 per GB per month.  On the surface, this seems like a huge reduction in cost for using the cloud, but upon closer inspection, a standard retrieval from Deep Archive will cost twice as much and take three times longer to have data accessible than Glacier. So cloud storage costs have come down, (in some cases quite dramatically), and this reduction will continue, but customers must be vigilant as to where cloud providers might be seeking to get their money back in other ways that are usually harder to plan for and budget for.

The marriage of tape and cloud: hybrid archiving

A common storage idiom worth remembering is: “Three copies on two different media, one offsite, and one offline”. A combination of tape and cloud storage could successfully fulfil this basic storage principle. The recommendation would be to always retrieve the local copy of data (kept on-premises) first, and should that method fail, then and only then, would the data in the cloud need to be accessed (as supplemental disaster recovery). In that way, data that will only ever be retrieved under the worst possible circumstances could be preserved in deep storage, such as Amazon Glacier. A single converged storage system, such as Spectra’s BlackPearl, that is capable of unlocking multiple storage targets with policies than can be applied to determine the number of copies to be stored on any chosen target, should be considered to control both on-premises storage and cloud storage.

As cloud costs can vary dramatically, it makes good sense to keep an eye on how the cloud market structure is shifting. In the event that an existing cloud provider changes its pricing structure, or a new provider appears, organisations will want to be in a position to easily switch cloud vendors. In such circumstances, an on-premises copy of data would provide the organisation with the option to abandon their cloud data, rather than pay exorbitant egress fees. Moreover, with modern-day object storage, a seamless change of cloud vendor could be achieved by uploading a local copy from tape at the touch of a button.

The best storage strategy comes down to a solution that optimises the strengths of both tape and cloud, therefore reducing the risks of failure. Data growth, the level of collaboration required and the diversity of workflows are all driving end users toward a new model of data storage. The traditional file-based storage interface is well suited to in-progress work but breaks down at web scale. Object storage, on the other hand, is built for scale. Therefore, the combination of tape as a primary back up and cloud for disaster recovery, behind an object storage device that handles the workflow, and is built for scale, creates a powerful union. With this approach, data can be cost-effectively and efficiently protected in the long-term.

Eric Polet, Product Marketing Manager, Spectra Logic