Data has been completely over-analogised recently, with many of them not holding true. It cannot be oil, or any other commodity for that matter, as it is not a store of value nor does it exist in a natural form waiting to be extracted. It also cannot be a moat, a defensive element against competitors, as simply holding data will not provide a defence for long anyway.
Data can be both an asset or a contra asset, think data breaches, it can also be short term or long term in nature. Too much data can become difficult to manage increase risk and not enough data is not efficient for decision making. Low-quality data can misinform decisions and high-quality data requires extremely high efforts for processing. Data in the wrong format or state can be useless, but if applied to the right problem statement in a specific time window, it can be incredibly valuable.
The economics of data – ‘Alexa, how does my data behave?’
Most theories about the usefulness of data come down to how data can be applied to create scaled competitive advantage. A good example is Amazon’s ‘customer also bought’; at a basic mechanical level, the assumption is that more data will give a better recommendation, which in turn drives customer stickiness and acquisitions. This leads to more feedback (that’s data too) and perpetual motion is solved… or is it?
In a classical economics model, adding more resources with a fixed investment upfront will lead to increasingly favourable economies of scale. In this case, however, the result is the opposite.
A recent, popular example is the use of chatbots in customer support. In most instances, the cognitive assistant begins with a basic but sufficient set of responses to common customer queries or intents. As the corpus grows, the chatbot learns from additional customer queries. Applying traditional economies of scale, we should expect the level of intents per query to increase, meaning the chatbot is able to handle a wider range of customer questions.
Instead, the opposite is happens and the increasing volume and cost of data rises at a diminishing return.
So big data doesn’t work, what about small data? New algorithms are emerging for ‘low data’ machine learning techniques such as ‘one-shot learning’ and ‘transfer learning’. These can help bring data to life with limited overheads. For example, OpenAI showed how teaching a robot to perform additional similar tasks, in this case stacking blocks, only required a single demonstration.
Striking the balance – a framework for thinking about the value of data
Laying the foundations of a minimum viable data set
Steps must be taken to collate a usable corpus of data which constitutes the set in which training of the machine models should begin. This initial set may come from various sources such as internal data.
As a starting point, firms should consider data that has already been collected and needs to be held for legal or regulatory purposes. As this is a sunk cost, the data should be recycled where possible, assuming the necessary customer consents are collected. Another way to recycle data is using data exhaust, which is the data generated as a result of another process. A good method of finding this data is to crawl open-source websites and government resources such as data.gov.uk.
Data may also come from a small group of users, the control group, who will give feedback to the system and, in doing so, generate additional data. A further possible source of the initial set is from transfer-learning techniques, by using models trained on similar tasks. For example, a machine that recognises ‘cats’ can give a boost for a model that needs to recognise ‘dogs’. Firms may also consider generating the data synthetically, but must be conscious of creating a self-licking ice-cream.
Data has a use-by date
We live in a dynamic world of constant change. As people, environment, and preferences change, any data collected today may be less useful or even have no value tomorrow.
The prediction challenge is reliant on following trends, which means listening to customers. Portfolio managers have a rule of thumb that models maintain their shelf life for at most five years, but if you consider eCommerce use cases, these models may lose their edge in a matter of days or weeks if a competitor is able to leverage better insights.
Firms must treat data as dynamic and remove stale data which is no longer representative, regularly replacing it with new data from new sources.
How should we value data?
Let’s consider the widely-used and accepted formula of return on investment and apply it to the return on data (ROD).
ROD (%)= ((Data Benefits-Data Costs) )/(Data Costs )
Benefits of data can come in the form of direct top-line impact, including increased revenue per customer, improved customer experience (think NPS), greater customer acquisition, and/or a reduced sale cycle. They may also constitute bottom-line impacts by reducing operational overheads such as reduced waste, improved employee efficiency, less reliance on increasing your workforce and/or fewer technology overheads. There are some indirect benefits in the risk management area through better understanding your customers, suppliers, book, and employees, which can drive greater insights and help to manage risk.
Direct technology costs of data include infrastructure, which stores and processes all of your data, the employees that keep those systems up and running, and the associated tooling (software) – regardless of whether this is local or in the cloud, both can have significant and sometimes inhibiting costs. Specialist processing techniques such as deep learning may require special hardware, for example, Google’s TPU. There are also some indirect costs associated with the risk management of over storing data.
Using this simple but effective formula will help organisations understand which data they can actually derive value from and which data is no longer needed.
How much data is enough in the real world?
There are some great real-world examples data-centric companies delivering value via data. It can help to think about the total volume of data in much the same way one would approach packing a small suitcase for a weekend trip abroad; taking too much means either it won’t fit or you’ll be charged fees by the airline. Taking too little means you will be underprepared for that swim in the ocean.
There are some basics which are always needed: a clear purpose and outcome in mind, data models to understand the existing data, a basic set of tools to examine, test, and fit the data, and finally a way to visualise the data to tell the story or prove the outcome.
Several companies are now being labelled ‘data startups’, leveraging the power of data, and advanced analytics to provide predictive insights:
JetTrack.io leverages flight data of corporate jets to predict corporate transaction activities. For example, it predicted abnormal flights from Occidental to Ohama which resulted in a $10BN investment from Berkshire Hathway in Occidental Petroleum.
Airbnb’s Aerosolve is a machine learning tool that helps get insights to where they are needed most – the employees and customers. This is the core engine for the dynamic pricing and price tip tool for hosts, and has leveraged their existing data sources to increase the value to operators and customers alike. This project and Airbnb data are both open sources.
SpookFish which was acquired by EagelView Technologies in late 2018 and provides a 3D aerial view of properties with precise measurements. Using satellite imaging and specialised algorithms, it gives companies insights that would have previously required a physical site inspection which is particulary useful for insurers, for example
The oil well is dry, the moat has turned stagnant and the bacon has gone bad..
In this digital era, data is a fundamental part of every company’s strategy, operations, and customer experience. Although, data in itself will not maintain market share, improve operations or delight customers. This data-perpetual-motion loop does not exist and the traditional economies of scale do not apply to data.
By taking a holistic approach to data, using data already available, and augmenting this where possible, firms can achieve an initial competitive advantage. Firms can then supplement this with differentiated offerings for customers, new products, and a culture of data-driven insights, which will be far more beneficial than data alone.
Peter Heywood, Partner, Banking and Capital Markets, Genpact