Skip to main content

How will stream processing impact data management in 2019?

(Image credit: Image Credit: The Digital Artist / Pixabay)

The world is changing. Big data is a term bandied about often enough but what does this actually mean for businesses in any industry? Well, for starters, they will have to be able to react to data in real-time and offer personalised service to each customer based on their unique preferences and history. They will also need to be able to respond to issues instantaneously and to constantly improve business operations. We can describe data as the lifeblood of every organisation. When you consider digitally native companies such as Lyft or Uber, they operate on live streams of data, but they also provide excellent customer experience, a standard that all other companies need to aspire to today.

Data streams are everywhere these days. Look around and you will find sources of information that are continuously generating data. Everyday life is stuffed full with devices that are generating many data streams. Smartphones, cars, security sensors and televisions are just some examples. More are on their way. Smart houses of the future are likely to have many different types of sensors for sensing and adapting the domestic services to the changing needs of the inhabitants. For example, artificial motion, audio and visual sensory systems are already being used to revolutionise the face of assisted at-home living for senior citizens.

Stream processing is a technology that turns the classic query/response paradigm of data processing around; the application logic, analytics, and queries exist continuously, and data flows through them as a stream of events. With this,  users can query a continuous data stream and then quickly detect conditions within a small time period ( a few milliseconds) from the time of receiving the data. For example, it allows you to query a data stream coming from a temperature sensor and receive an alert when the temperature reaches the freezing point. Stream processing is one of the core enabling technologies that is driving a new wave of real-time data applications across organisations of all industries. A recent IDC report on the global datasphere predicts that by 2025, 30 per cent of all data generated will be real-time, and that 6 billion consumers will interact with data every day.

Given the importance of stream processing to the enterprise environment, what are some of the developments that we should expect to see over the coming months?

5G begins to make an impact

Both 5G, and the associated proliferation of sensors and IoT devices, will create even more real-time streaming data and more use cases that will need instant reaction to events. In this way you can expect that stream processing will be used as an efficient way to realise "edge computing." Stream processing is a great match both for pre-processing data on devices or gateways and for running event-driven logic on the edge. It will be interesting to see how 5G impacts computer architectures going forward. On the one hand, it could be used to boost centralised processing of data. If moving many gigabytes or terabytes to the cloud attracts small penalties, then could 5G actually put a damper on edge computing?

On the other hand, if 5G evolves into an underlying communication landscape where data centres are linked with edge devices, then it could end up boosting edge computing. In this scenario, the storage and processing required for Artificial Intelligence (AI) and other workloads is shared across the fabric, enabling developers to take better advantage of the available resources.

The 5G is still in its infancy, but one thing is certain - data is about to get bigger and it’s going to get faster.

Artificial intelligence applications take off

The expected explosion of AI applications will undoubtedly make distributed stream processing an absolute necessity. Aside from pure streaming machine learning techniques, stream processing will become integral when assembling complex feature vectors that are input into machine learning predictors. Distributed, high-performance stream processing frameworks will be a critical factor when trying to efficiently model and pre-process increasingly complex real-time data at scale; this applies equally to machine learning models and algorithms.

Complex data management becomes a no-brainer

Expect to see stream processing being adopted for intricate data management. Rather than using relational databases, stream processors will be able to process ACID transactions directly across streams and states. In the same way that event streams hold the source-of-truth for changes, ACID-compliant stream processing is able to resolve many streams, with overlapping and conflicting changes, into a consistent state. All this and at a fraction of the cost along with greater flexibility and easier deployment.

Easing your GDPR compliance woes

Stream processing is an easier way to build a GDPR compliant data infrastructure. Classical "data at rest" architectures make life very difficult when faced with locating sensitive data. Streaming data architectures work directly with “data in motion”, without any necessity for long-term storage or data replication). They also make it easy to keep delicate information isolated in application state for a limited time, therefore making them naturally compliant.

Helping with ever-evolving cyber security challenges

Cyber security is always in the headlines and will continue to be a closely scrutinised issue within information technology. In order to detect and prevent security breaches, cyber security solutions need to look for anomalies in the metrics and usage patterns across network infrastructure, applications and services. Stream processing technology is a great match for cyber security challenges; with streaming ETL it offers real-time extraction, transformation and aggregation of events, as well as the ability to perform complex event processing for pattern detection and real-time evaluation and adjustments of ML models over continuous streams of events.

With more enterprises embracing a more ‘real-time’ operating model, the world of data management can only get more and more interesting. As companies transition to a streaming data architecture, they can benefit from the advantages of innovative, real-time data applications to help them become software operated, embrace responding to data streams in real-time and thrive in the ever-evolving world of complex, big data.

Aljoscha Krettek, Co-founder & Engineering Lead, Ververica
Image Credit: The Digital Artist / Pixabay

Aljoscha Krettek is a PMC member at Apache Beam and Apache Flink, where he focuses predominantly on the Streaming API as well as the design and implementation of additions to Flink’s APIs.