• UK / United Kingdom
  • ICO
  • Search
  • Privacy
  • Europe
  • Data
  • Government
  • Processor
  • Iron Mountain
  • Virtualisation
  • Where to begin with Deduplication


    16 April, 2008, by David Galton-Fenzi

    De-duplication in itself is easy to understand – optimised storage capacity usage by eliminating duplicated data. However the devil is in understanding the different technologies, techniques and implementations in the market and relating these to customers specific needs.

    Instead of storing data multiple times, de-duplication enables the data to be stored once and uses that single instance as a reference.  The techniques used to do this vary.  For instance, we could look for complete files which are the same, and only when these are a complete match with each other, is a single instance created. 

    Alternatively we could look at files which are basically similar (for example revisions of a draft document) and create a single instance of a master file only saving the byte level differences between this and subsequent files.  So which of these approaches is best?  As always, the answer is not straightforward.

    If we look at the first of these – working at a file level, rather than a byte level, there are well established techniques such as CAS – Content Addressable Storage.  With this approach the contents of the file are put through a mathematical mincer and the end product is a unique identifier which is attached to the file. 

    Article continues after advert

    If exactly the same file exists somewhere else in the system, the mathematical mincer will produce exactly the same identifier – indicating a duplicate file which can be made into a single instance. 

    Using this approach, every time a spelling mistake is corrected, or punctuation is added to a document, a new identifier would be created and both versions of the document stored.

    Continued on next page Tags: Business Continuity, Data Management, Information Life Cycle, Information/Data handling
    Posted by
    David Galton-Fenzi
    on 16 April, 2008
    ITProPortal.com - Sponsored Section

    Featured Content

    1. The New Voice of the CIO. 158 CIOs in midsized businesses across 31 countries reveal their insights and vision for enhancing competitiveness over the next five years.

      Download Document

    Customer Case Studies

    1. How a wine wholesaler improved the flow of information
      Download full case study
    2. The server that made an entire university smarter
      Download full case study

    Videos

    Connecting in a smarter planet:

    Latest Tweets





     





    News Now Logo




    Forgot your password?