How an agile data-lake is powering the science behind predictive marketing

The company you work for probably doesn’t know much about its potential customers. Right now, most of your understanding about prospective leads is pulled together piecemeal manually by combining SalesForce lists, LinkedIn data and personal experience.

You can’t really tell the difference between a good lead, meaning one that is sales-ready, or a lead that requires significant nurturing. Or, for that matter, a terrible lead that will never buy anything at all.

But at what cost? Sirius Decisions studies show that more than 95 per cent of marketing leads never convert to customers. This erodes the relationship between sales and marketing and wastes precious budget dollars on low efficiency programs and campaigns.

But that’s OK, right? This approach has been working for you guys since the 90s.

Well it won’t work any more.

There’s an emerging area of data science, Predictive Marketing, that is radically reimagining the way we look at sales leads and customer data. I’m not talking about a marketing agency selling a silly tool that scrapes Twitter data. I’m talking about hard data science. The sort of science that provides answers and predictions within minutes, that would take hundreds of hours to generate manually. We can do this by taking advantage of the endless streams of data that your prospects generate to get a smarter view of their organisational wants, needs and habits.

Just like traditional scientists relied on beakers, bunsen burners and petri dishes as the core tools for experimentation. So to the modern data scientist needs sophisticated algorithms coupled with a robust and scalable database to ingest the data and extract useful insights. In this article I’m going take a look at the tools powering the data science behind our predictive marketing engine. Hopefully this will provide a better view of how data science is being executed in the field to understand and engage with target customers. And ultimately drive sales.

Predictive Analytics: Chasing good companies, not bad guys

For a time I worked for an intelligence agency. Our job was to identify unique patterns in huge data sets. It was through this process that I became familiar with the modern art of data science. After couple of colleagues and I left the agency, we realised there were many other applications for our skills. Rather than using data to hunt bad guys, we could use it search out good companies. We could also see there was another industry crying out for this kind of analysis: marketing intelligence.

The company we created, Mintigo, uses predictive analytics to help discover your ideal customer profile, target the prospects with the highest propensity to buy and engage them with the right message through the right channels. By incorporating thousands of marketing indicators, our service can predictively score and segment all potential prospects, even the ones you haven’t met yet.

For example we worked with the Customer Insights Team at Red Hat, the open source enterprise computing company. It’s a company with a complex offering, multiple product lines and a long sales cycle. Chasing after the wrong leads can be a time destroying disaster. Mintigo worked with Red Hat to build out predictive models for each of their product lines and help identify the ideal customer profile. We did this by cross-referencing Red Hat’s prospects with very specific marketing indicators - such as “Has Enterprise Architects in the Company,” and “Uses WordPress for Company Website”. The results have been fantastic. Mintigo customers like RedHat usually see a 4x improvement in overall marketing funnel efficiency. This is achieved by reducing the amount of Marketing Qualified Leads 2x, and getting 2x more overall Sales Ready Leads from them.

Sounds cool, but how did we actually do the data magic? Well, we built some pretty fancy technology. The core piece of software for us, as it is for many applications, is the database layer, coupled with intelligent machine learning algorithms. Our main database is MongoDB, a non-relational database perfectly suited to the challenges of predictive marketing – massive volume of data, extraordinary data variety and a need to ingest that data at speed. And to not only process the data but to also make sense of it. This is where MongoDB’s expressive query language and data processing pipelines give us the capabilities we need to unlock powerful insights.

We initially prototyped on an alternative database technology called PostgreSQL. It’s a great relational database but it soon became clear that it would never handle the schema flexibility or scale that we needed.

We also found that MongoDB has excellent resources and support for our existing native environment of Python and Amazon Web Services. It was already well documented how MongoDB works in this framework so we weren’t forced to change environments or languages to use the database with the best performance. The strength of MongoDB’s support and documentation in a wide range of environments is one reason it’s so popular with companies of all sizes.

Agile data lakes and algorithms 

The first step for us is capturing the data. We ingest all types of data from across the Internet, mostly raw unstructured data, and put it into MongoDB. Then we process all of that data using MongoDB’s MapReduce commands to create our marketing indicators, of which there are thousands. These indicators could be something sophisticated such as looking at what technology an organisation uses, or relatively simple as in whether the company is considered part of the Fortune 1000 or their total amount of funding.

We pull all this data into a central MongoDB repository and run our algorithms over all of the information. We correlate it with our client’s lead lists for classifiers and clustering. From there our data scientists can ask questions to pull out a very specific profile.

To make this whole process work, we need to ask tough questions of unstructured data quickly. These are big complex processes - checking thousands of indicators against terabytes of data. This challenge is where MongoDB has provided such value. The key is MongoDB’s expressive query language and secondary indexes, which allows us to quickly serve up answers from the database, without requiring us to scan every record. In my experience, no other database is capable of handling ad-hoc queries against unstructured data in this way, at this velocity.

The way Mintigo pools data could be considered a data lake, though we prefer to think of it as being more agile and managed than that. As I mentioned, we need prompt answers and don’t have time for lengthy batch processes. Often data lakes are associated with Hadoop, which we did look at using early in the process. However, we soon saw that Hadoop and HBase would add more complexity than we were comfortable with, particularly as we’re not a Java shop. We also quickly saw that MongoDB could handle the hefty loads we were putting on it and pull out responses in the minutes we required, rather than the longer time scales often associated with data lakes.

Our database is currently 15TB in size, and growing rapidly. We run MongoDB as a sharded cluster, on top of AWS, and use MongoDB Cloud Manager to automate provisioning of new nodes and provide proactive monitoring of the environment.

Simplicity leads to complex understanding

We’re a company of data scientists but not everyone has the same level of experience in development languages or programming. Using MongoDB is so intuitive that we’ve standardised on it.

A lot of non-developers can easily use MongoDB to run queries. In fact, one of our board members, who had never touched NoSQL before, is using it. There are not many powerful enterprise data tools that you can say that about.

The old way of sales and marketing are gone. A strong handshake and a good network will only get you so far. For modern enterprises to stay competitive they need a better customer understanding and they need it fast.

To gain that understanding they’ll need rich data and the tools and talent to understand it. We’re now in the era of math men, not mad men. So find your data scientists and give them a good bunsen burner.

Tal Segalov, CTO & co-founder at Mintigo

Image credit: Shutterstock/Sergey Nivens