Today, every business has data at its core. The data mountain has never been bigger. IDC predicts that the global volume of data is likely to rise to 175 ZB by the middle of the decade.
Most organizations routinely backup their data innocently thinking their processes are robust, and the data is sound. However, a variety of issues can hinder data retrieval, some of which aren’t discovered until it is too late.
The bottom line is that an organization has a duty to preserve relevant data for review upon reasonable notice of litigation. Managing data in a manner that makes it difficult, if not impossible, to recover will not excuse a corporation from that duty. Put simply, some organizations are gambling that their legacy data will be accessible and usable if needed.
Some of the most common threats to data accessibility include failure of the backup software or the storage media, the sheer volume of the data held by the organizations, ageing archaic systems, forensically unsound methods, simple human error or disasters caused by such as fire, water damage, mud, extraordinary cold, heat or other natural catastrophes.
Four tips for mitigating the data accessibility gamble
Within the larger context of information life cycle management, organizations are looking to data management experts to help them manage stored data more efficiently and reduce the load on IT personnel and infrastructure. As part of your solution, it is important to consider the following four tips:
Define the project
The success of a project involving the manipulation of stored data depends on the ability of those tasked with the work to identify and understand the project scope and challenges so they may plan accordingly.
Recording the type of media and its condition is just as important as clarifying the suitable target medium. Even with devastating damage (such as through water and fire), there is usually some sort of recovery possible, offering the opportunity to arrange the company’s long-term backups better at the same time. In this scenario, it is important to work quickly before the media ultimately become unusable because of adhesion or corrosion. What data protection requirements exist? For example, if data cannot leave the company, then conversion must be done on-site. Or perhaps obsolete servers need to be rebuilt in such a way that the previous access rights can be reconstructed as well.
Analyze the data
An organization must identify the contents of the media to make informed decisions later about data retention, destruction, or suitability for compliance or litigation readiness. Depending on the business needs, scanning, cataloguing or indexing the media can help an organization narrow their focus to the relevant media.
Enterprise backup software is designed for managing large quantities of data, not for identifying and accessing specific content. It is complex and requires a relational database to manage backup parameters, sessions, schedules, errors and other statistics.
Without the original backup software and/or the specific tape machine or equipment that recorded that data, content identification will be the biggest hurdle and one of the higher costs of the project. “Cataloguing” and “indexing” have different meanings among long-term backup media software vendors. A long-term backup’s catalogue usually refers to the backup sessions on a media set. Some backup vendors save this identification metadata on the tape itself. However, there are a growing number of backup software vendors that put the Media ID, Backup ID, or Session ID on the media, which references back to the software’s relational database.
Manage and refine the data
Organizations regularly complete incremental backups daily or weekly and full backups at month-end. Although this is industry “best practice”, it results in the creation of multiple copies of the same data.
Based on the previous analysis and knowledge of an organization’s backup procedures, the relevant data set can be culled further and assuming there is no active legal hold on the data, the duplicate data can be deleted. If the data must be retained, backups can be consolidated by restoring them to higher capacity tapes.
Review data conversion or manipulation needs
It is important to understand the degree of complexity involved to keep the project on schedule and within budget.
Some conversions are straightforward, such as copying files from one computer system platform, so they are readable by another platform. Other conversions may require more technical expertise. For example, consider how digital content specifications differ between mainframe, midrange, and desktop systems. IBM and AS/400 computers use the EBCDIC code to represent the alphabet, while in most instances the ASCII code is the norm. Maintaining data accessibility for this type of project requires translation work, including the conversion of an AS 400 database in EBCDIC format of a fixed length into an ASCII code of flexible length or a .csv file for PC.
A more complex conversion may involve the manipulation of fields in a database. For example, Payment Card Industry (PCI) compliance requires disguising cardholder data when storing credit card numbers. In this scenario, a data management expert could expand and extract the contents, find the cardholder numbers, and apply masking characters (such as “X”s) to the appropriate data.
Streamlining the effort required
The problem has been solved by using technology to streamline the entire process. Rather than rely on a false sense of security, many corporations seek expert consultative assistance with proven experience using forensically sound methods, and deep expertise in legal, compliance and IT issues. We suggest you do the same.
A project involving the management and manipulation of stored data can be triggered by a variety of regulatory, compliance, or e-discovery needs. Planning for data accessibility streamlines the effort required to meet those needs and mitigates the associated risks.
Historically, it has been time-consuming, technically difficult, and cost-prohibitive to incorporate legacy data into an organization’s information life cycle management plan.
After relying on IT to restore the data, Legal would work with IT to analyze the relevant data required to support an investigation or lawsuit. Due to budget and infrastructure limitations, restoring thousands or tens of thousands of tapes was not feasible.
Philip Bridge, President, Ontrack