Had I been blogging in 2001, this would have been posted then. And although I posted something similar to this on May 16th of last year … I am posting this now as some of the reporting to date around my work in this area has been overstated and/or inaccurate.
Following September 11th many newspaper and magazine stories began showing how the hijackers were related to each other and ultimately to Osama Bin Laden. And with these pictures came suggestions that this event could have been prevented had the government had access to much more data (e.g., health care records, banking records, communications, etc.). As well, there appeared to be an emerging consensus that by studying merely the shape of the 9/11 network, one may able to locate similarly shaped networks – thus, detecting and preempting future events.
I disagreed with this thinking. In fact, it was my opinion that, at least in the case of 9/11, neither more large data sets to graph the nation nor hunting for similar network shapes in this graph would have been necessary (or even useful) for detecting and preempting this event.
Ever see someone standing in front of a giant graph? Imagine a picture with millions of nodes connected via millions of lines each with varying thickness and color. Think spaghetti-fest ready to feed 10,000 people. While very impressive to look at … looking at such is not useful in establishing a starting point.
Networks are useful when one has an entrance point. From a specific vantage point one has a string to pull. And in the case of criminal investigations, these starting points are "predicates." By this I mean knowledge about something or someone that meets some threshold on the scale of reasonable and particular (calibrated with respect to the crime i.e., a different threshold for a deadbeat dad versus a nuclear threat), and that justifies some further action.
From this predicate one begins an investigation or inspection – pulling the string and marching down the path toward the ultimate fact: whether someone is planning something, or has done some bad act. When an investigation is started without a sufficient predicate, or starting point, one risks rampant false positives which not only waste resources, they bring investigative attention to the innocent – which results in unnecessary intrusion on our privacy and worse our civil liberties.
While the question of "what is a predicate" is worthy of a longer conversation and debate, in the case of the 9/11 hijackers, there were two perfect starting points. Both Nawaf Alhamzi and Khalid Al-Midhar were already known to the US government to be very bad men. They should have never been let into the US, yet they were living in the US and were hiding in plain sight – using their real names.
When 1+1=13: Starting with these two guys I drew from various public sources (e.g., investigative journalism, grand jury indictment, etc.) to demonstrate how the network would have looked. In short, with basic investigative procedures, I demonstrated that at least 13 of the 19 could have been exposed.
So, back in the day when running SRD, I created a series of PowerPoint charts to illustrate exactly this point.
This was first published on page 28 of the Markle Foundation’s report entitled "Protecting America’s Freedom in the Information Age." Since then, it has found its way into a number of other publications (e.g., Newsweek: Geek War on Terror).
From time-to-time, though, this work has been characterized incorrectly. For example:
It has been said that the data was run through NORA to develop this analysis. Nope. I never had this data. Rather, I just analyzed the open source and told the story – which required no computational power at all.
It has also been said that had NORA been in use by the US government, 9/11 would have been prevented. Ha Ha! The whole point of my 9/11 analysis was that the government did not need mounds of data, did not need new technology, and in fact did not need any new laws to unravel this event!
Just to be clear, I am not saying better technology and better laws would not be helpful. Obviously, our government needs both. I am simply saying that according to my analysis 9/11 very possibly could have been averted without either. I attempted to make this point in my most recent paper entitled "Effective Counterterrorism and the Limited Role of Predictive Data Mining." In this paper my co-author Jim Harper of the Cato Institute and I were able to draw upon new insights revealed in the 9/11 Commission Report to more clearly describe just how effective predicate-based link analysis would have been in the context of 9/11.
One more thing: I am often asked about how false positives would have effected my 9/11 analysis had such an investigation been carried out in the real world. The relationships selected for this demonstration involved solely shared addresses, phone numbers and frequent flyer numbers. When constrained by date ranges, the number of additional parties would likely have been minimal, unless the addresses and phone numbers on the plane reservations were actually those of the travel agency (which was not revealed in open source documents). As such, I posit that the investigation would have produced a small universe of subjects and would have revealing the likes of Mohamed Atta.
Postings on this site don't necessarily represent IBM's positions, strategies or opinions. Jeff Jonas is the chief scientist of IBM Software Group's Threat and Fraud Intelligence unit and works on technologies designed to maximize enterprise awareness; Jeff also spends a large chunk of his time working on privacy and civil liberty protections. He will be writing a series of guest posts for Security Blog.
For more on Entity Analytics; click here.