Skip to main content

Source Attribution, Don’t Leave Home Without It

I use the word “attribution” to refer to pedigree-related metadata that is associated with a piece of data, e.g., the data source, transaction number, author, timeframe, location, etc.

When dealing with information sharing “source attribution” is essential. By source attribution I mean the referential metadata that uniquely identifies the original copy of the record e.g., the originating system (“system of record”) and the unique transaction ID.

Systems containing records without source attribution metadata have some rather unfriendly characteristics, specifically:

Accuracy and currency degradation: Without source attribution, reliable data tethering is not possible. Records that have been updated or deleted in a system of record cannot then be simultaneously corrected in secondary systems. Source attribution enables synchronization throughout an information sharing food chain.

Un-auditable: Without source attribution, information sharing systems are not auditable! One must have source attribution if one wants to ensure that records and values in System A are accurately reflected in System B. For example, how else can one be sure that watch list records deleted in one system are correctly removed from secondary systems?

Notably, I have seen some talk of anonymization systems whereby it is provable that one does not know the originating system and transaction. While this might be appropriate in some contexts, when building systems that must be current and auditable, source attribution is a must.

Anonymization systems without source attribution are provably un-auditable!

From a privacy perspective, when thinking about information transfers (which for many reasons can and should be minimized), all information transfers should include source attribution (whether it is an anonymized information transfer or not).

Otherwise, organizations cannot audit their systems and cannot ensure a “non-arbitrary” process or outcome … a fundamental human rights “no no”.

This is because if an authority is operating on a “fact” for which they cannot establish its source … how could that not be provably non-arbitrary? [See: Responsible Innovation: Designing for Human Rights]

Systems with guaranteed source attribution are capable of being accurate, current and auditable, and in the context of anonymization-based systems such pedigree pointers enable “selective revelation” … whereby the original data holder has complete determinism over any revelation (de-cloaking) event based on law and policy.

Postings on this site don’t necessarily represent IBM’s positions, strategies or opinions.

Jeff Jonas (opens in new tab) is the chief scientist of IBM Software Group’s Threat and Fraud Intelligence unit and works on technologies designed to maximize enterprise awareness. Jeff also spends a large chunk of his time working on privacy and civil liberty protections. He will be writing a series of guest posts for Netcrime Blog.

For more on Entity Analytics, click here. (opens in new tab)