How not to get ‘confounded’ by data science

Henrik Nordmark, head of data science at data science consultancy Profusion, discusses how data science can be made more useful to business and avoid sending them down the wrong path.

Data science is undoubtedly a powerful tool for businesses. It promises to make them smarter, engage with their customers better, improve how staff are managed, increase sales and plan effectively for the future.

However, data science only has value if the insights that are gained are accurate and the right action is taken. Making the wrong assumptions, mistaking correlation for causation or becoming the victim of confoundedness (more on that later), will send a business down the wrong path, possibly with disastrous consequences.

So how do you make sure that the findings from your data science team are useful and not biased? Since the days of Francis Bacon, the 17th century philosopher not the painter, the scientific method has hinged on carrying out repeated experimentation. The same is true for data science; businesses need to plan for on-going experimentation and repeated testing of results.

In practice, this means allocating enough time to your data science team or consultancy to ensure that the results are sufficiently rigorous. It also means that you shouldn’t jump the gun when you make a particularly exciting finding.

‘Confoundedness’ is probably the greatest danger to business decision-making. Put simply, this is the attribution of the wrong factor to success and the superficial reading of observational data. It is where the human element causes problems. A basic example would be placing a special offer on goods that seems to lead to a surge in sales. In reality, the special offer may also correspond with a period of great weather that got shoppers out en masse, boosting sales.

Psychologically, it is extremely tempting to attribute success to decisions we have made and failure to external factors. The solution is to set up further experiments and test every assumption. There is no substitute for rigour. The job of a data scientist and the business at large is to question everything and prove the connection between cause and effect.

Lean methodologies with tiny, frequent business experiments can be a great way of quickly converging on what a business needs to know in a cost effective and systematic manner that gathers the right data instead of just big data. Similarly, the only way you can find out the effect of a new call to action, such as a new website design or pricing strategy, is by setting up experiments.

If it sounds like I’m repetitively repeating myself regarding the importance of repeated experimentation, it’s deliberate. This point cannot be laboured enough and it’s something that business owners need to have in mind when they deal with data science. It is not enough to marry a clever machine-learning algorithm to a business question and some data. Business questions are often vague or ill-defined and there needs to be a joint process between data scientists and other business stakeholders to sharpen what exactly would be beneficial to investigate.

Consequently, data science can’t exist within a silo, business stakeholders throughout the whole business need to be involved in the process. This spurs more relevant questions, analysis and insights, which in turn prompts better understanding of the capability of data science within the business, encouraging more information sharing and ultimately better decision making.

Taking the data science plunge can understandably make a business nervous. However, by having the right processes in place, the willingness to test ideas empirically and the infrastructure to carry out frequent experiments, an organisation can dramatically increase what they are able to learn from data. Businesses that ignore data science as an option or use it badly will likely fail.