Interview: The past, present and future of predictive analytics

The level of accuracy in predictive analytics relies heavily on the quality of data. However, it is often overlooked that the quality of data relies on an organisation’s ability to seamlessly integrate with the various systems that collect unique customer data, as well as the ability to analyse the data in context.

Predictive analytics can come up short if the data remains in silos across the organisation (the web team sees website analytics, loyalty teams see membership levels, marketing team sees email conversions, etc.).

I spoke to Dominik Dahlem, Senior Data Scientist at Boxever - a customer intelligence and predictive marketing company for airlines and travel retailers - about the past, present, and future of predictive analytics.

How has predictive analytics evolved over the past decade?

The methodologies and algorithms that go into predictive analytics have actually been around since as early as the 1700s. Over the past decade, much of the popular attention in the analytics field has been on cognitive systems and deep learning. IBM’s Watson brought cognitive systems into the spotlight when it defeated "Jeopardy!" champions and demonstrated that machines can learn, reason and understand natural language.

Deep learning marks a new wave of neural networks - enabled by experts including Hinton, Bengio, LeCun and Schmidhuber and their respective teams among others - that are able to learn and model high-level abstractions in image, speech, video data and more. Statistical literacy and fundamental programming skills involving data structures and algorithms have improved in recent years due to the proliferating triumvirate of big data, cheap computation and analytics.

These developments, along with related advancements in Artificial Intelligence, were integral in enabling actionable insights using tremendous amounts of data.

Can you provide some examples of how companies are leveraging predictive analytics today?

As consumers continue to lead increasingly digital lives, they leave breadcrumbs of data that are waiting to be analysed across social, health, purchasing, traveling activities and many more. The healthcare industry presents powerful examples for predictive analytics. On the back of Watson, IBM has developed cloud offerings to surface new insights from personal health data to assist clinicians in diagnosing illnesses from medical images and unstructured medical records. On the consumer side, online shops such as Amazon or movie streaming services such as Netflix have dramatically improved the customer experience by providing personalised recommendations.

While the healthcare and retail industries continue to make incredible advancements in how they leverage predictive analytics, many other industries are just now getting started. Typically, for organisations and markets ready to embrace the power of analytics, the first step is to begin layering transactional and behavioural data in order to create extremely smart and contextual customer profiles, which in turn will enable organisations to improve the customer experience through personalisation and intelligence-based decisioning.

Many companies have a process for analysing transactional data. As you mentioned, now we’re starting to hear more about behavioural data. What’s the difference, and what are the benefits of melding transactional and behavioural together?

Transactional data captures the relationship of customers, products and orders through purchasing activities. Traditionally, this information is structured and stored in databases. Based on transactional information alone, we can draw insights into how products are related through co-purchases, how customers are related by viewing their order book and how customer and products are related. Historically, this transactional data is what has been more commonly used to drive promotional marketing, such as targeted email campaigns.

Behavioural data, on the other hand, may be found in Weblogs or in dedicated events tables. Typically, it goes beyond "what happened" to figure out "why it happened". Data analysts refer to this kind of information as unstructured or semi-structured data. Unlike transactional data, unstructured behavioural data is typically text heavy, but may also include other figures such as numbers or dates.

Behavioural data augments transactions in important ways. For example, based on how someone browses through a booking site, travel companies are able to distinguish a family holiday from a business trip or relate searched and booked products. Along with the ability to differentiate searched versus booked products we are able to price convenience, such as how much is a traveler willing to pay for a shorter flight or more convenient flight times.

Another example is an online retailer, which can assess which products you’ve viewed online, and offer similar ones down the road. The classification of behavioural data therefore facilitates more targeted in-session personalisation, because product offers can be geared towards nuances of the clickstream in real-time.

You mentioned the importance of context. Can you talk more about this and why it matters?

Identifying contextual information is paramount in providing a best-in-class recommendation. Examples of where contextual information becomes important are the ability to differentiate between seasonality of certain flight destinations or cross-sell opportunities of rental cars. A traveler may fly to Tuscany, Italy in the summer months and the Alps for skiing in the winter.

Recommendations need to be sensitive to this information. If a travel company knows a traveler is about to book for his summer holidays, the marketing or e-commerce team can simplify the booking process by offering one-click buy travel packages with proposed dates and destinations, based on that consumer’s needs and preferences. Alternatively, travelers who are more adventurous can be offered brand new recommendations while still taking recent searches and their relationships into account.

We’ve found that understanding the context of a situation, and layering that into your communications, is one of the most important actions a company can take to increase engagement and conversions.

What are some common challenges that IT teams face when trying to collect and aggregate data from across the organisation?

Traditionally, data capture, management and analysis are based on a requirement analysis for each business unit. In the most extreme case, this leads to capturing the same entities in different formats with different identifiers.

For example, revenue management may have a different customer identifier and attributes for the same customer than the marketing department, which is collecting similar data for Web-facing interactions. The overlap is not only inefficient, but when it’s not analysed and integrated, it lowers the amount of customer intelligence organisations have at their disposal. Problems with time zone information often occur as well. If specific information is missing, we often need to infer the concrete date/time, which is error prone, especially during times shifts like daylight saving times. Another example is the challenge of businesses operating in multiple markets that need to support multiple currencies. Ultimately, data integration and business intelligence need currency conversions, which relies on external currency exchange rate tables.

When integrating two or more data sources, identities have to be matched and linked across these sources covering the different business units and an over-arching data schema needs to be defined to unify the attribute naming and capture. Once aggregated, this information must be kept up to date with data capture from each source.

What are some recommendations for overcoming these obstacles?

Unless organisations have a single system that’s collecting and analysing information company wide, it is often difficult to prevent mismatches. The key to dealing with duplication is documentation. All attributes need to follow a consistent naming scheme, be fully specified with a description that gives the business a reason for having it in the first place, standardised type and format and how missing information is encoded.

Standardisation of types and formats should ideally follow business guidelines that cut across business units to avoid potential integration issues.

The power of predictive analytics relies heavily on the accuracy of the data being analysed. How can organisations ensure that the data they collect is clean and accurate?

For businesses that provide critical business insights and analytics, the quality of the data plays a crucial role. Ingesting high-quality data facilitates reliable reporting of business insights. The flip-side to that is that low-quality data has the potential to negatively disrupt business.

It’s useful to view complex systems as living systems where emitted data may not be all that clean, and sometimes, even faulty. In order to ensure that data is clean and accurate, the instrumentation points of where data streams are tapped into, formatted and stored need to be carefully reviewed. Any time data is handed over to another sub-system, translation errors can occur. And the more sub-systems data trespasses, the more exposed the data is to potential errors. All steps along the way need to be carefully reviewed in terms of pre and post-conditions on the data as well as potential delays.

What are your top recommendations for marketing and customer experience departments that are looking to get more out of predictive analytics?

Just like any other business domain, the best approach to conducting any analytical task is to understand the decisions that particular functions are dealing with on a day-to-day basis. Working backwards from a high-level business requirement, we can pinpoint the data - and ultimately predictions - that support the decision-making process.

One example might be to run data-driven marketing email campaigns to recommend personalised offers to customers. The big questions are Who, What and When? And how can we add value? Predictive analytics can now provide answers to each of those questions for each customer individually. A deployed solution would run this campaign in a continuous fashion.

Take an airline for example: Each day, as the summer holiday season approaches, the campaign would segment a different set of customers that would receive a personalised email based on inferred destination preferences. Along the way, the system could keep track of which customers booked summer destinations to avoid spamming.

Image Credit: klublu / Shutterstock