Skip to main content

Archival: The new “killer” cloud application

All organisations face the challenge of managing low-touch, unstructured data that consumes valuable enterprise storage. For the majority of organisations, this task has been moved to the “back burner” for years. With the availability of low-cost public cloud storage, now is the perfect time for organisations to reduce the load of unstructured data on enterprise storage with a Next Generation Cloud Archive. 

Managing unstructured data is a major issue considering that unstructured content accounts for 90 per cent of all digital information. Taking into account the cost of enterprise storage and the cost for the support and maintenance of aging document repositories; the proper management of unstructured data directly impacts the IT budget in a major way. 

The public cloud is the perfect platform for managing unstructured data. The public cloud offers virtually limitless scalability and the ability to access and share data from any location. Amazon, Azure, Google and other leading public cloud vendors sell ‘cold’ blob storage for as little as $0.01 - $0.02 per GB per month. Compared to the total cost of enterprise storage, you can enjoy 50X savings!

First analyse

The first step to managing unstructured data is to analyse it carefully. Take an inventory of your file shares and note the names, type, age and owner of all the files. Work together with department business owners to classify data according to its business and legal value. Data that has no business or legal value should be promptly removed. 

Data required for business reference, legal, audits and compliance needs to be preserved according to the organisation’s rules for disposition. Application retirement is another means of reducing cost when reducing unstructured data. At a minimum, unstructured data rests on network file shares which have costly annual support and maintenance fees. Or unstructured data may be resting in an aging archiving application that has costly support and maintenance fees and costly server and storage hardware.

Next generation cloud archiving

To reduce the cost of managing unstructured data consider a next generation archiving application. The archiving application runs 100 per cent in the cloud and performs as a thin layer on top of the blob storage providing collection, indexing, search and access control at a minimal cost. The archive application runs on a virtual machine with a SQL database to store meta information. Indexing, web services, encryption, active directory, business analytics and more are useful services to complete the archiving application.  

Cloud services are consumed on a “as-needed” basis thereby minimising cost. As storage demand scales up, low-cost storage is available instantly with unlimited capacity. For eDiscovery, compute and indexing services can be scaled up to meet high demand and a tight deadline. And when eDiscovery is finished, storage and compute services can be scaled back to save cost. 

Data collection is critical to the success of the archiving application. It is a mistake to assume that files will be simply copied to the new repository. The truth is unstructured data comes in many formats and locations that require a sophisticated approach.   

Indexing dilemma

Email data is a good example. Email data can be found in on premises email servers, email archives, journal archives and PST files. Email collection tools identify active/inactive mailboxes, rehydrates email stubs if any and migrates email in its original format to the cloud repository. 

SharePoint data is another good example. SharePoint sites have undoubtedly spread throughout your organisation and have created silos of information. Much of this information is old and obsolete and is consuming valuable enterprise storage. For SharePoint, the collection process provides tools to discovery sites and migrate content. After data collection, data must be indexed so it can be searched for eDiscovery. 

Do you want to index all the content or would you rather just search subsets of the content? Indexing consumes compute and storage resources and depending on the amount of content to index, this can be very expensive. It is better to index only the data you wish to search and conserve compute and storage cost. A simple example is to index all content for specific custodians for a specific period of time. 

The perfect platform

Along with data collection and indexing, a next generation cloud archive delivers features like hit highlighting, legal holds, tagging, saving and exporting. Full auditing and reporting provides easy access to reports for audits and legal chain-of-custody evidence and automated disposition makes it easy to adhere to retention rules without incurring management overhead. The public cloud is the perfect platform for next generation archive applications that leverage low cost blob storage and services. 

The archive application performs as a thin layer on top of cold blob storage and provides important services for collection, search, index and access control. The archive application runs 100 per cent in the public cloud and consumes cloud services on a “as needed basis” thereby optimising cost.

Image Credit: Chaiyapop Bhumiwat / Shutterstock
Bob Spurzem, Director, Field Marketing,

Bob Spurzem, Director of Field Marketing at Archive360 is a 30+ year industry veteran and recognized expert in software, archiving and eDiscovery. His resume ranges from the ground-floor of start-ups to the Fortune-100.