The Cato Institute released a paper that Jim Harper and I co-authored. Harper and I have been working on this paper for about two years. One of the big challenges we faced in getting this paper drafted was dealing with all of the confusion related to data mining. It turns out that what is data mining depends on whom you ask.
The key point of our paper is that the form of data mining which uses historical incident data to determine a pattern … then using this pattern to predict a future event is not helpful in the terrorism context because there isn’t enough historical data to derive a meaningful and statistically reliable pattern. Thus, we settled on the term "predictive data mining" to differentiate what we were characterizing as ineffective from many other effective uses.
This paper also highlights a real governmental need to efficiently locate, access, and aggregate information about specific suspects. To highlight this point we show that starting with two primary suspects, available data points and existing laws, a good number of the 9/11 terrorists could have been identified in a very narrow investigative fashion before September 11th.
Make no confusion about it; though data mining has many value uses from reducing corporate direct marketing costs, to classifying celestial objects and even medical research, it just so happens that it is not so helpful to discover underlying patterns of low- incident terrorism.
Jeff Jonas is the chief scientist of IBM Software Group’s Threat and Fraud Intelligence unit and works on technologies designed to maximize enterprise awareness. Jeff also spends a large chunk of his time working on privacy and civil liberty protections. He will be writing a series of guest posts for Netcrime Blog.
For more on Entity Analytics, click here.