Digital advertising has always been about finding the fastest way to achieve optimal results, so it’s no surprise the industry has taken a shine to machine learning (ML). Offering efficient data processing and decision making, this intelligent technology assists marketers to determine who to target, where to place ads, and how much to bid on available inventory.
But there is one problem; in its traditional form, ML isn’t as independent as it seems. For the most part, systems are based on rules, which means that when exceptions occur – as they often do within an industry that targets real people – algorithms are sent off kilter.
Fortunately, there is another kind of smart technology that solves this issue – reinforcement learning (RL). This technique is built to allow greater flexibility, taking the lessons it gains from experience – rather than rules – as its guide.
Let’s take a deeper look at why RL beats basic ML when it comes to effective advertising.
RL: a two-minute overview
To start with, it’s crucial to define what RL is and how it works. Essentially it is a type of deep ML – and therefore artificial intelligence (AI) – where machines are trained to take the ideal action within a specific context using positive and negative rewards. It works by using RL algorithms to solve a specific kind of problem; whereby an artificial agent must decide what the best decision is in its current state by analysing the choices it currently has and the rewards associated with those choices. A simple yet effective way to think of it is like a mouse in a maze: the mouse is heading for the final reward of cheese at the end, but along the way there are smaller good rewards (water) and bad outcomes (blocked paths), that the mouse must assess and navigate. The same also applies to machines; except that instead of a treat, the machine will meet key performance indicators (KPI).
The main factor that makes RL different is its fluidity. Rewards are not immediate and each action changes the environment, which means the ‘maze’ is not set: the agent learns how it ought to behave via the rewards and develops the most beneficial strategy for meeting the end goal on its own. This is in sharp contrast to conventional ML processes, which require restrictive rules for every scenario and how the agent should respond to them.
Why does it matter to marketers?
Modern marketers don’t have an easy job. Where once consumers would come directly to a certain brand, they can now take their pick of multiple products and services across varied channels. Moreover, they also use a range of tools to do so; research shows that there are now 3.5 connected devices per person around the globe.
The upshot of this is that brands must work harder to win customers and eclipse rivals: delivering personalised messaging that moves with individuals throughout their multi-screen journeys and aligns with their unique needs. And that, in turn, means marketers need a tool that can analyse huge stores of data – including variations in device, interest, and habits – to establish how ads should be served for maximum, in-the-moment impact.
In short, the situation calls for a blend of efficient data processing and adaptable decision-making that conventional rule-based ML can’t provide, but RL can.
RL in action: how can they use it?
So we come to RL application. Although still a relatively new field – certainly in comparison to traditional ML – the industry is becoming increasingly aware of RL’s capacity to tame unstructured data sets and use the insight they contain to inform successful campaigns.
And, at present, there are two central use cases:
1. Boosting speed and scale
The human brain is remarkable – capable of processing around 400 billion bits of data per second – but it’s no match for machine learning. RL algorithms can make instant decisions in 100 milliseconds: drawing on vast data about their context, options, and past lessons to make the best move. The most famous example of this is Google’s AlphaZero defeating the leading chess-playing computer game, after teaching itself how to play with self-reinforced learning in just four hours. But there are other ways it can be used to gain a competitive edge. For example, with access to insight about previous media buying and target audience habits, RL systems can help companies make controlled decisions in the programmatic marketplace; placing optimal bids for ads that will reach large yet relevant audiences at the ideal time, and lowest price.
Thus far, the industry’s ultimate goal of delivering personalised advertising on a one-to-one level has been hard to realise; especially in areas such as retargeting. While marketers could access data about individual tastes and behaviour, the only way to engage audiences at scale was grouping these individuals back into broader categories. RL machines are set to solve this issue. By collecting multiple data streams – such as site views and purchase records – they will define the creative messages and offers most likely to strike the perfect chord with individuals, in real time. As more positive rewards flow in, such as strong results for engagement KPIs, they will be able to continuously increase the impact and resonance of messaging for individuals by adjusting ads to only ever focus on what works for them.
In an industry that’s always looking for the next time, money, and labour-saving technology, unceasing tides of change are par for the course. Right now, the next big shift on the horizon is a transition that will see the industry move from limited rule-based systems to fluid RL tools that can adjust messages to reflect whatever consumers want, whenever they want it. This means if marketers want to stay ahead of competitors and deliver the best ads, they must embrace RL, before everyone else does.
Claudia Collu, Chief Commercial Officer at MainAd
Image Credit: Jacob Lund / Shutterstock