Discoverability: The First Information Sharing Principle

Guest Column by Jeff Jonas > As I mentioned in my posting of last week, enterprise information must be registered in the card catalog or it cannot be located in any efficient manner.

Here are two key points about this process.

What goes in the catalog? The short answer is Metadata. For example, at the library it is subject, title and author. Maybe your mission needs who, what, where and when. This is actually one of the hard parts ... deciding what to include.

If you make this too robust out of the gate, you will be doomed by various complexities. So if you have not already done this, then I recommend selecting just the most basic attributes first.

Generally one wants to include: (a) enough attributes to determine when future objects (e.g., documents, people, things, etc.) are the same -- semantic reconciliation being the technical word for this, (b) attributes that help relate the object to other related objects (e.g., addresses can play a role in relating people), and (c) the attribution/pedigree (e.g., source attribution) including date/time and location. There are some other categories, but this is a good starter kit.

What is going to prompt a system custodian to give you any catalog metadata in the first place? The answer should be because they care about the enterprise mission. But, that generally won’t cut it. So what you will probably need is "policy" followed by budget authority. Here is the approach I would try.

If a system is placing metadata in the card catalog then their information will be discoverable across the enterprise at large. This is good. If they do not put metadata into the card catalog it is not discoverable. This is bad. Therefore, using the metric "percent of data registered in the directory", a budget authority can quantify which systems have the greater potential for enterprise value.

Budget authorities then use budget to "reward" systems with the most discoverability as they will have the greater potential to create enterprise value.

The role of directories in information sharing is nothing new of course. For example, the Markle Foundation’s Task Force on National Security in the Information Age discusses the importance of directories in its third report entitled "Mobilizing Information to Prevent Terrorism" (pages 59-61). [Truth in advertising: I am a proud member of this Markle Task Force.]

Directories play a key role in reducing the amount of data flowing in the network. Because only limited attributes (metadata) are transferred to the directory, most data remains with its original custodian – which has decent privacy ramifications. And coincidentally this directory-based architecture is in my opinion the only technically viable solution to address large scale information sharing initiatives.

Postings on this site don’t necessarily represent IBM’s positions, strategies or opinions.

Jeff Jonas is the chief scientist of IBM Software Group’s Threat and Fraud Intelligence unit and works on technologies designed to maximize enterprise awareness. Jeff also spends a large chunk of his time working on privacy and civil liberty protections. He will be writing a series of guest posts for Netcrime Blog.

For more on Entity Analytics, click here.