Follow ITProPortal:

RSS Tweet Digg

Where to begin with Deduplication

Additionally, the lifecycle of primary data can be fleeting (minutes or even seconds) so going through deduplication may be an unnecessary process.  As a result, today, with a few evolving exceptions, byte level deduplication is aimed at the backup environment.

Another key option to consider is where in the data centre we implement deduplication?  This doesn’t sound too important, but it is a raging argument among the vendors in this part of the industry.

Some approaches have implemented deduplication for backup with a software ‘agent’ loaded onto each application processor which undertakes backup.  This spreads the load of the deduplication processing requirement across the processing power of all the servers involved – but crucially must interact correctly and effectively with the existing backup software packages loaded onto the servers. 

The upside of this deduplication implementation at source is that the process is completed before any data is sent to the storage devices, minimising the data transfers between server and storage.

The downside, is that encountered by any agent based strategy, the agent must stay compatible with server software.  This means that any software upgrade or change on any server creates a potential for incompatibility and adds to the management task for the server administrators.

The alternative approach is to have a dedicated platform in the backup path which handle deduplication ‘on the fly’.  This effectively centralises the process. 

The benefits here are that the platform, not the servers, delivers the processing power for the deduplication and because it requires no changes to the server software, it is effectively transparent to the user.  Some storage vendors are taking up the idea of embedding these functions in their storage devices – though none appear to exist yet. 

In many ways this endorses the in-line platform as the most elegant solution, because all they are doing is maintaining the in-line dedicated platform, but locating it in the storage device.

Whichever approach eventually becomes the dominant implementation, as the data deluge continues to accelerate, deduplication will rapidly become a core element of any data centre’s storage strategy. 

It is not only the storage capacity savings that are attractive, but also the support deduplication can offer for compliance (only one instance of a file makes it easier to manage, protect and delete as required) that will continue to drive this market.



blog comments powered by Disqus

Follow ITProPortal:

RSS Tweet Digg

Owned &
operated by:

Net Communities