Datafication: Ancient and modern

One of the six steps to gaining insight advantage is creating data, which follows on from asking the right questions.

Once you have defined the questions you should be asking – not defaulting to just asking those that can be answered – the challenge is to uncover or create the data that will deliver the insights required.

The process of creating data has recently acquired the ugly name datafication, popularised in Viktor Mayer-Schonberger and Kenneth Cukier’s book Big Data. Wikipedia defines datafication as a ‘modern technological trend turning many aspects of our life into computerised data and transforming this information into new forms of value’.

This presumes an entirely digital phenomenon when the process of creating data so that we can better understand our world has existed since the creation of the first numerical system in Babylon in the second millennium BC. And it has been a (if not the) primary driver of economic and social development, particularly since the Enlightenment. More data means better insight which means better decisions whatever the area - politics, business, healthcare or scientific research.

The expression creating data is perhaps a little misleading because it is more a case of structuring the miasma of information that surrounds us into a format that yields the understanding we seek. Hence datafication, for all its inelegance, is a more accurate description.

All the data that we might ever need exists somewhere, it just needs to be discovered (or uncovered) and structured so that it is amenable to analysis. When looked at in this light, datafication has a long and illustrious history including:

  • Map creation – from the ancient maps of Babylon, Greece and Asia to current satellite navigation systems;
  • Accounting – starting with the double entry book-keeping first used in 13th Century Florence through to the sophisticated financial ratio analysis that is common now;
  • Experimentation – Galileo employed experiments in the late 16th Century for scientific purposes and they remain integral to R&D today, also to marketing (A/B testing propositions, messages)
  • Statistical inference – structuring distributions to describe them in terms of centroids (mean, median, mode) and measures of dispersion (standard deviation, inter-quartile range) – has its origins in the 17th Century and underlies modern predictive analytic techniques
  • Graphical visualisation – from the charts of William Playfair (18th Century) and Florence Nightingale (19th century) to the software-based interactive visualisation that has become increasingly common over the past few years;
  • Market research – for example Likert scale surveys for capturing degrees of belief or opinion and conjoint analysis for determining relative preference and utility;

Maps provide a good example. The creation of a basic map involves defining a point or area according to longitude and latitude, then calculating altitude relative to sea level – essentially decomposing a point into its measurable dimensions and then measuring on them.

To this can be added more measures, classifications (according to measures) or categorisations – average temperature and rainfall, prevailing wind strength and direction, land usage, population density, birth and mortality rates and average income level among many others.

The above process gives meaning to geographic space. Insight is generated through applying structure - identifying how that space can be defined, capturing the data and organising it in a way that delivers understanding.

Six steps for structuring unstructured information into data

Structuring information typically follows six steps:

  1. Identification – of the relevant measurable dimensions and categories into which information can be decomposed
  2. Extraction – of the data and measuring or categorising
  3. Classification – grouping data, whether by interval (0-100m, 100-200m above sea level, etc.) or consolidating low level categories into higher level ones (e.g. wheat fields as arable, potato fields as horticulture, etc.)
  4. Indexation – across classes and categories to create relative measures, also over time to identify degree of change
  5. Summarisation – of the key insights identified by indexation
  6. Interpretation – of the insights, adding meaning through suggesting potential causes with recommendations on further investigation or action to be taken

Structuring human behaviour using digital technology

To illustrate this approach, let’s take a video of a suspect being interviewed by the authorities in relation to a crime, the ultimate aim of which is to identify potential lies through inconsistencies (between body language and words) or signs of high stress. Being video-based, this process lends itself to digitisation and automated algorithmic assessment following solution training and rules definition by experts.

Human behaviour provides many measurable dimensions in the form of facial expression, vocal elements and body movement.

  • Expression changes can be quantified (frequency, length of time), categorised according to muscles used then classified as to their meaning (contempt, disgust, anger, fear, surprise, happiness, sadness). Facial asymmetry – which increases with tension – can also be defined geometrically;
  • A similar process can be applied to eye movements (up left, up right, etc.), while blink rates – a potential stress indicator – can also be quantified;
  • Facial colouring can also be decomposed into degrees of Red, Blue and Green (0 to 255 in each case) with changes from baseline – to stress-induced pallor or embarrassment-induced blushing, for example – all quantifiable;
  • Vocal amplitude (decibels), pitch (hertz) and tonal quality (irregularities in pitch and changes in amplitude – jitter and shimmer) are also measurable;
  • Body posture can also be described in terms of angle (forward, backwards or upright) and curvature (slumped, straight). Breathing rate can be tracked via the frequency of fine shoulder movements. Arm and leg movements (face touching, foot-tapping) are also categorisable and quantifiable;
  • The final dimension is language – speed of speech, gap length between phrases and repetitions of words, terms or concepts.

With suitably accurate photographic equipment and sufficient analytical processing, data extraction and measurement can be entirely digitised. Categorisation and classification require the involvement of analysts in the first instance, but once the groupings are defined the process can be automated. With the classification and measuring complete, indexing becomes a computational task of identifying normal ranges and correlations between abnormalities across indicators.

Summarisation involves highlighting these abnormalities in tabular or graphical form. Interpretation then translates the information into recommendations. In the first instance this would be handled by analysts but after a while the implicit rules they follow could be codified and programmed into a rules engine with natural language generation capabilities.

This use case may seem extreme but that is deliberate to highlight how something as seemingly abstract as human behaviour can be datafied with the smart application of digital technology. And even if the precise data to answer a specific question cannot be found, a good proxy usually can. Finding the data that enables a hypothesis to be tested is what social scientists call identification strategies and success in this area distinguishes the great from the good.

In his book Adapt, Tim Harford writes “While Steven Levitt is famous to a wider audience as the Freakonomics researcher who did the research about the drug dealers and Sumo wrestlers, to other economists he is famous for the brilliance of his identification strategies.”

Asking the right questions requires the very human skill of curiosity and success in answering them requires the equally human skill of imagination – art to go alongside the science.

Levitt has both in abundance – hence his ability to amaze with insights which others would not even look for. And any organisation that wishes to prosper in the knowledge economy needs to get a bit freaky.

Is your company ready for Freakobusiness?

Jack Springman, Head of Customer Analytics, Sopra Steria

Image source: Shutterstock/McIek