Five Steps to a Data Archive Strategy

Data centres around the world are being tasked with storing ever-greater amounts of digital content. This burgeoning storage requirement drives many data centre managers to recommend that the business increase investment in expensive IT storage resources.

However, some realise that they can use long-term archive strategies to significantly limit additional investment.

Instead of storing all data on expensive front line storage systems, they recognise that archive data can be migrated to more appropriate and cost-effective alternatives.

This forward-thinking approach not only reins in IT budgets but also delivers compelling business benefits over the long term.

Those data centre managers that believe an archiving approach to data growth is nice in theory but too complicated in practice, risk missing a huge opportunity.

By thinking through five key issues, companies can begin to create a compelling archiving strategy.

Volume

Active data is data that is currently being created or used; static data is never changed and rarely accessed. Multiple studies have shown that 80% of all data stored on magnetic disk RAID systems (primary storage) is static data.

The ability to move as much as 80% of data off primary storage onto secondary storage, such as optical, can slash management overheads.

Magnetic disk storage is expensive to operate, protect and replace when compared to other technologies that are more appropriate for infrequently accessed archive data.

The first step to any archive strategy is to separate active from static data in order to reduce the volume of data residing on primary storage.

Value
You now have your data divided into two buckets: active and static. The next step is to assess the value of this data so it can be properly managed.

You can assume that the value of active data to your business is high since it is currently being used. Determining the value of static data is more difficult since it is not all equal.

The most effective approach is to create categories defined by the value that the data represents and place static data into the most appropriate category.

This categorisation process allows you to define management policies over the life cycle of the data and once in place lends itself to automation of the process.

Retention
Once your static data is categorised according to its value, you need to determine how long each category should be retained.

The most notable external factor that influences retention periods are government regulations. However, internal policies also have a role to play.

In either case, it's critical to your archive strategy that the retention period be clearly defined for each data category.

Most organisations find that they have several different retention period requirements. For example, corporate history may need to be retained indefinitely, financial records for 10 years and emails for 5 to 7 years.

Given that retention periods are measured in years, it is important to choose a storage technology that provides long-term support and does not require frequent replacement. RAID storage has the shortest life averaging between 3 and 4 years.

Magnetic tape is longer if properly maintained, 4 to 5 years. Professional optical storage has the longest life, with typical replacement cycles greater than 10 years.

Risk
While providing long-term access to archive records can be of enormous benefit to your company, these same records can also represent a liability if poorly managed.

For example, if your company is taken to court, it will be required to produce a wide range of records. Not only must they be delivered within a certain time frame, you must be able to demonstrate that they are authentic.

Without a controlled archive strategy this could be impossible, greatly increasing your risk, but even with an archive, it must be built with risk mitigation on mind.

If you put all of your archive data on magnetic tapes offsite, searching for and retrieving data can prove impractical.

Choose to store your archive data on rewritable magnetic disk and you may have an equally difficult time demonstrating that it has not been altered. Where data is retained beyond the legal obligation, it may be requested for disclosure, adding unnecessary risk.

In these examples, risk can be reduced by choosing a technology that allows data to be retained online, stored in a true WORM (Write Once Read Many) format, and physically destroyed at end-of-life.

Cost
When considering an archive solution, you must look at total cost of ownership (TCO) rather than pure acquisition cost.

Because an archive operates for many years, your financial assessment should take into account the initial purchase and also maintenance, system replacement and the operating overhead.

With the huge increases in electricity costs, more CIOs require archive solutions with the lowest possible power consumption and cooling overhead to reduce their energy costs and lower their carbon foot print.

With a long-term financial perspective, solutions which may seem inexpensive at first glance can actually prove to be a very costly archive.

The best example of this is "inexpensive" RAID storage. The initial purchase cost may be relatively low, but after you factor in maintenance, backup overhead, frequent replacement, and very high power consumption, RAID systems are by far the most expensive archive alterative.

Summary
The amount of information flowing into your data centre can seem overwhelming, but tackling the problem head on by developing an archive strategy is your best defence.

If your archiving policy is well designed and the correct technology selected, it can help you comply with regulations, manage corporate risk and make valuable information accessible in a way that could enhance your competitive advantage.

By considering these five issues when developing an archive strategy, organisations are not drowning beneath the tide, but capitalising on the value of their information assets to improve the way they run their business.