Data Tethering: Managing the Echo

When one system transfers data to another system, what happens when the original data changes? Or what if the policies governing the original piece of data simply change after it has been transferred?

Say someone who manages a government watch list sends their watch list to a secondary organization. Later someone is cleared (removed) from the list. What assurances can be made that the cleared individual will also be removed from the secondary organization’s watch list? Now imagine how complicated this can be if the recipient of a watch list then re-distributes (cascading) the watch list to tertiary organizations and so on.

Some organizations sell their customer lists to secondary organizations, e.g., to a marketing alliance partner. What if one of the customers requests that their name and address not be sold, and what if they ask for their information to be redacted from secondary sources where transfers have already occurred?

Guess what? Bad news. Most organizations don’t even know what customer records were transferred (at least at the customer level) as they likely only know what extract criteria was used on what date and the total record count.

Data tethering means when data changes at its source, the change is reflected through the entire food chain. Every copied piece of data is virtually “tethered” to its master copy. Non-tethered systems contain errors until the next database reload. And the greater the window between database refreshes, the greater the error rate.

Non-tethered systems in national security and law enforcement settings are problematic as there can be real privacy and civil liberties consequences resulting from organizations operating on incorrect data points. And a resource waste to boot.

Data tethering is an important design element when thinking about responsible innovations.

[Miscellaneous note: From a manageability perspective, there may be some reasonable number of cascading data transfers. For example, in some settings like watch lists it may be ideal to mandate no more than two tiers of transfer (e.g., Source A transfers to B and C, then B re-transfers to X, Y and Z). Maybe public records are stipulated for a three-tier maximum. The point being if there are too many tiers, it will not be possible to ensure currency and accuracy across the network.]

Postings on this site don’t necessarily represent IBM’s positions, strategies or opinions.

Jeff Jonas is the chief scientist of IBM Software Group’s Threat and Fraud Intelligence unit and works on technologies designed to maximize enterprise awareness. Jeff also spends a large chunk of his time working on privacy and civil liberty protections. He will be writing a series of guest posts for Netcrime Blog.

For more on Entity Analytics, click here.