Streaming Analytics vs. Perpetual Analytics (Advantages of Windowless Thinking)

The terms "streaming" and "perpetual" probably sound like the same thing to most people. However, in the context of intelligent systems, I think there is a big difference.

[Note: when I use the term "observation" below, feel free to think about this as a synonym for "transaction" or "record."]

Streaming analytics involves applying transaction-level logic to real-time observations. The rules applied to these observations take into account previous observations as long as they occurred in the prescribed window – these windows have some arbitrary size (e.g., last five seconds, last 10,000 observations, etc.).

Perpetual Analytics, on the other hand, evaluates every incoming observation against ALL prior observations. There is no window size. Recognizing how the new observation relates to all prior observations enables the publishing of real-time insight (i.e., The Data Finds the Data and the Relevance Finds the User). And another unique property is Sequence Neutrality (i.e., future observations can affect earlier outcomes).

Just to be fair, both streaming and perpetual analytics engines have their place in the world. For example, sometimes transactional volumes are so high … non-persistence and small window sizes are the only route.

However, when the mission is significant and transaction volumes can be managed in real-time … perpetual analytics answers these questions "How does what I just learned relate to what I have known?" "Does this matter?" and "Who needs to know?" And if you can’t answer these questions, then your organization is likely to exhibit some degree of Enterprise Amnesia.

So how many observations per second can our current technology sustain? Recently, we achieved a new record: roughly 600 million observations ingested and contextualized in under five days. And amazingly, my team thinks they can double the performance with some more tuning!

Another reason, by the way, so much throughput is necessary is because historical data cannot just be bulk loaded. Constructing context from historical data involves streaming the data in. I sometimes describe this in terms of "sticking a straw into the historical data and slurping it out one observation at a time." In short, such systems must incrementally learn from the past! [Exception: if you do bulk load, then you must first crawl through the bulk loaded data to contextualize these historical observations as if they had been incrementally ingested.]

Postings on this site don't necessarily represent IBM's positions, strategies or opinions. Jeff Jonas is the chief scientist of IBM Software Group's Threat and Fraud Intelligence unit and works on technologies designed to maximize enterprise awareness; Jeff also spends a large chunk of his time working on privacy and civil liberty protections. He will be writing a series of guest posts for Security Blog.

For more on Entity Analytics; click here.