To Know Semantic Reconciliation is to Love Semantic Reconciliation

Semantic reconciliation is possibly the most fundamental building block required to make intelligent systems intelligent. When I say "semantic reconciliation," I mean: "Recognizing when two objects are the same despite having been described differently." Or put more simply, this is about counting like things.

In disease research one would need to know the difference between six reported cases of Lupus versus one case reported six times. A 911 operator receives emergency calls from six people, each reporting the sound of gunshots. Is this one incident, six separate incidents, or somewhere in between?

I stayed at a W Hotel a few weeks ago. They asked me if I was in the loyalty club program. I did not know, so I had them look. I turns out I am in the loyalty club program three times. They think I am three different customers when in fact I am one. They don’t know me! (Ironically, I checked into a different W Hotel last night and they could not find any loyalty club records for me whatsoever).

If all data collected contained global unique identifiers (e.g., a bar coded serial number), then semantic reconciliation would be trivial. But the world collects different features in different ways from the same object. Some systems record me as Jeff Jonas and others Jeffrey Jonas. Sometimes I share a frequent flyer number and no date of birth, and in other places I share a date of birth and passport number. So how many Jeff Jonases are there? Organizations that cannot count unique objects make suboptimal decisions and in the case of the multiple loyalty club accounts, maybe denying a decent customer decent rewards, e.g., had all the points been recognized as one belonging to one account!

It is important to address semantic reconciliation before other analytical processes (e.g., statistical analysis, market segmentation, link analysis, etc.). This is a "first things first" principle because semantic reconciliation makes secondary analytic and computational problems that much easier and that much more accurate.

And, while my primary focus over the years has been the semantic reconciliation of identities (people and organizations) with attention to massive scale and subtle little nuances like sequence neutrality, similar techniques are possible for many other things (e.g., in Las Vegas the Starbucks on the corner of Sahara and Maryland Parkway happens to be the same as the Starbucks at 2595 S. Maryland Parkway).

If one cannot count discreet objects, one cannot properly construct context. And when organizations make decisions without context – brace yourself for bad decisions – and say hello to more Enterprise Amnesia!

Postings on this site don't necessarily represent IBM's positions, strategies or opinions.

Jeff Jonas is the chief scientist of IBM Software Group's Threat and Fraud Intelligence unit and works on technologies designed to maximize enterprise awareness. Jeff also spends a large chunk of his time working on privacy and civil liberty protections. He will be writing a series of guest posts for Security Blog.

For more on Entity Analytics, click here.