Many organizations still rely on tape for back-up – but should they, in an increasingly digital world? The issue came into sharp focus when one of our customers asked for advice about whether they should transfer all their backup data from tape to cloud storage.
The data in question totaled 2 petabytes spread across several hundred tapes, covering the last five years’ worth of backups. The organization was exploring the idea of transferring it to the cloud and had been offered a tape restore service. At first glance the cost for outsourcing the solution seemed expensive and they wanted a second opinion.
We began by asking them what they wanted to achieve and – the key question – why they thought this was necessary for their business. It was not just a question of the relative merits of tape and cloud storage. The issue they needed to address was more fundamental: how did they expect to use the data that they so carefully backed up every week and month. In other words, why were they storing so much data, did they really need it, and how quickly and easily could they access what they required if the need arose?
Tape still exists within cloud
Tape has been with us for a long time and once set up it is easy to expand. But it certainly has an image problem. It is unreliable and may not be easy to retrieve from in an emergency. Most in-house IT teams loathe tape backup because of its tendency to fail. Despite this, it remains an effective and cheap option for long term data retention and off-site archiving, and many organizations still have some dependence on it.
What is less obvious is that long term cloud storage also uses tape. Look at archive blob storage in Microsoft Azure, for example, and you will find data stored on tape. It’s the same with AWS Glacier. The tape is just hidden behind the cloud facade, so the user may not even realize it’s there unless they ask their cloud provider. In other words, by moving data from tape to cloud, organizations are not necessarily choosing a more technically reliable medium, but the problems of recoverability are given to the cloud provider. The decision whether to change is about strategy, not technology.
Tape indexing may be insufficient
To evaluate the tape to cloud proposal, our customer had to decide what they wanted and why they thought it was necessary. If they wanted to be able to restore by date – therefore verifying the state of a file on a certain date – this would require all the data to be retained. If they wanted to restore by folder, filename or block, this would enable them to use deduplication to remove the many copies of the same file held, and therefore they would require significantly less storage. Typically, deduping 2PB of data will reduce it to 15-20 percent of its original size, which would leave this organization with around 300 - 400TB requiring storage.
They also needed to consider how they would search for the data they wanted to retrieve. Even looking for a specific piece of data such as a name will require computer forensic techniques, as there is not enough detail on a tape index to say that Joe Bloggs sent an email on 14th January with an attachment (although more recent systems include this type of search capability). Restoring a file or folder is significantly easier.
It was not worth investing in either solution if what could be restored was not useful to the business. And there was no easy way to do this without checking the integrity of the data and cataloguing and indexing all 350 tapes.
The outsourced tape solution provider had offered to catalogue the data and review every tape – one of the reasons for the high cost. The customer would then be able retrieve all the data in one lump or procure an insurance policy to choose which tapes to restore. They would be accessible via TFTP or customer supplied NAS for restore.
Is more data a legal requirement or a liability?
At this point every organization must consider what it actually needs the data for in order to set out its backup strategy. It may need to accept the cost of restoring by date for regulatory or legal reasons. In a regulated environment, for example, where Sarbanes Oxley controls are necessary, having the right backup strategy is essential.
However, in other circumstances having large quantities of data may be a liability. Searching through files and retrieving data could be an unnecessary cost and distraction from business as usual. For example, in the event of a GDPR subject access request and right to be forgotten, holding more data will increase the amount of time and cost of responding.
Each organization has to ask itself:
- How long will it take to retrieve a specific piece of data?
- What is this data worth to the business when it has been retrieved?
- What are the benefits to the business of keeping specific types of backup and what are the costs of not keeping them?
In defining its policy, an organization should consider both the letter and spirit of compliance. However much data it decides to keep, it should be able to demonstrate that it has done the best it can to respond to whatever request for data it has received.
This should include a clear process for taking and handling backups, with a time limit on storage duration, and a process for destroying old data. It is worth taking the time to work out what the minimum data required is, keeping only what is needed. For example, it may be appropriate to decide that for data over five years old, only certain specific items will be retained. Again, newer backup products make this easier.
In some sectors, such as construction, organizations take a final backup of everything important at project close-down and then should delete the physical files or copies from all other sources. Those backups need to be accessible in perpetuity (or at least a very long time). With new backup solutions being rolled out every five years it is important to ensure that these existing archives are still accessible and if necessary are transferred to new media. Cloud at least takes the media changes issue and makes it someone else’s problem.
Cloud native organizations have it easy
Organizations implementing cloud backup today without a legacy tape to contend with have a very much easier job. In cloud-based systems, with deduplication and compression enabled, data storage is less of an issue. Products such as Veeam offer incremental backup forever. This means that in theory they hold only one copy of each file and simply record the changes. You can then write a synthetic full backup weekly or monthly to enable you to restore all the data. Setting of retention periods is standard and the search capabilities are significantly more advanced. Office 365 and Druva also offer sophisticated indexing and search, enabling the location of data in core files.
Organizations with tape, however, face the same dilemma as our customer. You may be wondering what decision they reached. At this stage, the process is still ongoing. They decided to move their secondary backup to Microsoft Azure, so they are no longer accruing large volumes of tape storage themselves. They are still evaluating whether to take up the option of transferring their older data to the cloud or to retain it on NAS and catalogue it themselves. However, they now understand the strategic considerations behind their decision, and have developed a robust backup policy.
Drew Markham, Service Strategist, Fordway