Accumulating Context : Now or Never

Sensing importance across a sea of dynamic systems with constantly changing data requires the accumulation and persistence of context. (I am using the term persistence here to mean storing/saving what one has observed and learned – in a database for example.)

If a system does not assemble and persist context as it comes to know it … the computational costs to re-construct context after the fact are too high. Therefore, a system will be more intelligent when it can persist context on data streams … and less intelligent when it does not persist context on data streams.

[Sidebar: After explaining this to my lawyer friend Peter Swire he said this is nothing new. He explained, “That is just like the ‘touch it once’ principle from the One Minute Manager book!"? Yes, I had to confess, it is that basic – as is everything I conjure up. And, since when have lawyers become so concise?]

It is True: Context at Ingestion is Computationally Preferred

The highest degree of context attainable, per computational unit of effort, is achieved by determining and accumulating context at ingestion. This is achieved by taking every new data point (observation) received and first querying historical observations to determine how this new data point relates. And once this is determined, what has been learned (i.e., how the new data point relates to other known data points) is saved with the new data point.

Smart biological systems do this too. For example, as we humans “sense��? the surrounding environment, we assemble these streaming data observations (sights, sounds, etc.) into context at that exact moment. And we do this, with Sequence Neutral processing – whereby the final context is the same despite the order in which observations are processed – at least for the most part.

Now not to be too abstract here. But, while I have been harping on the importance of creating Sequence Neutral processes – no trivial feat in real-time context engines – I am coming to the conclusion that a few aspects of Sequence Neutrality cannot be handled on data streams at ingestion!

While this gives me a sinking feeling about the consequences this has to Scalability and Sustainability (i.e., no reloading, no batch processing), I am somewhat comforted by the fact that smart biological systems at the top of the food chain themselves go off-line for batch processing (i.e., sleep).

I’m theorizing that dreams are in fact species’ effort to re-contextualize that information which could not be ingested with Sequence Neutrality. Because if humans could do this while being awake, from a survival and evolutionary stand point, we would!

With all of this in mind, I believe that many architectures, systems and processes which have originated from the batch world probably will have a hard time emerging as high context, intelligent systems.

Further, I think next generation intelligent systems will be designed to assemble context on streams. But we have a long way to go towards intelligence on streams before we must resort to off-line processing.

Postings on this site don’t necessarily represent IBM’s positions, strategies or opinions.

Jeff Jonas is the chief scientist of IBM Software Group’s Threat and Fraud Intelligence unit and works on technologies designed to maximize enterprise awareness. Jeff also spends a large chunk of his time working on privacy and civil liberty protections. He will be writing a series of guest posts for Netcrime Blog.

For more on Entity Analytics, click here.