I kept thinking I should title this post, "Knock. Knock. Who’s there?" But I scrubbed that idea. This post is related to national security, intelligence, classification and information sharing. If this is not your domain, my comments below will make no sense.
Back in pre-9/11 days, those holding classified data used to apply the "Need to Know" model when considering information sharing. Following 9/11, there has been a call for improved information sharing, which has resulted in the new mantra "Need to Share."
What do you think is the difference between "need to know" and "need to share"? When push comes to shove, most people I speak with in the mission cannot quite articulate the difference. Not good.
In part, "need to share" involves a new mindset. This new mindset was highlighted in the third report issued by the Markle Foundation’s Task Force on National Security in the Information Age. For example, in this report we called for an increased use of tearline reporting and a decreased use of ORCON designations. (see report pages 44-48)
There is another aspect, however, that, while referenced in our report (see report pages 46 and 61), may be lacking the attention it deserves. And this is the subject of data indices. These are so fundamental to implementing a functional information sharing program, that I might hazard to say that without data indices … there is little to no hope information sharing will ever be solved. Let me explain.
If someone is the custodian of a highly relevant data item how will they "know who needs to know?" And conversely, if someone else is in need of this highly relevant data item how will they "know whom to ask?" Basically the problem is: who needs to know what?
Example: How will the folks working on counter-proliferation know they have a record that is directly related to another team specializing in anti-money laundering? The chances these two groups (even if working in the same building ... ouch) will actually recognize they have related data points is close to Z E R O. If there were just these two groups, the problem would be trivial and could be worked out. But in the real world, organizations may have hundreds of isolated data sets. On whose door shall I knock?
In this earlier post I introduced the Information Sharing Paradox. This paradox basically states that if everyone cannot share everything with everyone else, and everyone cannot ask everyone else every question every day … then how is someone going to find something?
The answer of course is one must first solve "discovery," i.e., knowing who to ask for what. All large scale discovery problems are solved by central indexes (data registries with pointers). Be advised, discovery is not solved by a federated search where one broadcasts searches across the enterprise. And if you hear that federated search is the solution, be afraid, be very afraid.
In order for "need to share" to fulfill its full potential, data custodians must first publish (limited) metadata to the central index. More precisely, when I say "publish data," in actuality they will need to use data tethering to ensure all adds, changes and deletes are properly reflected in the index. At libraries, index metadata about new documents includes subject, title and author. In your business this limited metadata is more likely to be something like who, what, where, when, etc.
As central indexes will be the means by which information discovery challenges are solved, this becomes a way to begin focusing the privacy and civil liberties debate.
One privacy related tension will be defining exactly what kind of data should be discoverable, i.e., placed in the index? For example, in counter-terrorism information sharing programs, there would be significant controversy over, say, including pharmaceutical prescription information of all US citizens; whereas, including foreigners banned from traveling to the US would probably cause little to no concern. The subject of discoverability (i.e., selecting which data will live in the central index) deserves much debate.
On the good news front, solving discoverability via central indexes brings with it a few useful privacy protections including: a) urges to share more data with more parties is replaced by transferring less information to one place (the central index), b) who is searching for what and what they found can be logged (e.g., using immutable audit logs) in a consistent manner thus facilitating better accountability and oversight, and c) information sharing between parties is now reduced to just the records that they need to know and need to share (sharing less by sharing only information that must be shared), and d) it is now possible to make the index anonymized (see: Anonymized Semantic Indexes), which means the risk of unintended disclosure of even the limited metadata in the index is drastically reduced.
Whether living in the "need to know" world or the "need to share" word, one must first be able to answer the question "who" and "what"; otherwise, this dog won’t hunt.
Postings on this site don't necessarily represent IBM's positions, strategies or opinions.
Jeff Jonas is the chief scientist of IBM Software Group's Threat and Fraud Intelligence unit and works on technologies designed to maximize enterprise awareness. Jeff also spends a large chunk of his time working on privacy and civil liberty protections. He will be writing a series of guest posts for Security Blog.
For more on Entity Analytics, click here.