Skip to main content

Data detectives: How an operational data hub takes the guesswork out of data

For super sleuths like Sherlock Holmes, using logic and the science of deduction to unearth patterns and establish links between the facts of a case was second nature.  

But with the amount of data created and stored on a global level growing exponentially, it’s not so easy these days to play detective. Enterprise data is straining at the seams and disparate data silos are making the problem worse. If data can’t be integrated so that it can be more easily discovered and searched, it’s just not that useful. Companies are missing out on a single 360-degree view of their data, which can bring a wealth of benefits. 

These include gaining valuable and potentially revenue-generating insights into their business processes or customers’ preferences, and avoiding the risk of huge fines by getting their data in order so that they can comply with current and emerging regulations.

The key perpetrator: Data silos 

One of the biggest issues is that much of a company’s data currently lies in multiple, unconnected data silos, often a legacy from earlier departmental initiatives. Mergers and acquisitions have conspired to create even more silos and multiple copies of data spread out across silos threaten data integrity issues.  

Adding a further data twist, today’s industries - from financial services through to pharmaceutical multi-nationals - are under growing pressure to comply with new and ever changing regulations including MiFID II (Markets in Financial Instruments Directive) and the EU's GDPR.   It’s enough to outwit even the data-deducing prowess of Sherlock Holmes. But there is one way to solve these data integration challenges and extract more value: An Operational Data Hub.     

Solving the data conundrum with an operational data hub

An Operational Data Hub is a virtual filing cabinet that can hold a single, unified 360-degree view of all data. Because up to 80 per cent of today’s enterprise data is unstructured or semi-structured - for example PDFs, online data, audio files, and video clips – it makes sense to build the hub using a database that can handle all these different data types. 

An Enterprise-grade NoSQL database fits the bill because it can handle any data type, and also removes the need to resort to many (costly) hours of complex data integration challenges and wrangling - extraction, transformation and loading (ETL) - a major weakness of traditional relational databases.   

Choosing the right Enterprise NoSQL database is key. Look for integrated search and semantic capabilities as well as full enterprise-grade ACID compliance. In an ever increasingly data-rich world, it’s important to be able to integrate and model complex data to reveal new relationships, patterns and trends. Semantics makes it easier to discover these inferred facts and relationships, creating concepts and categories and providing context. 

When a database has ACID capability, even the largest datasets are processed consistently and reliably so none of the data is ever altered or lost. Importantly, due to the scalability and agility of NoSQL, the system can also be quickly adapted, extended and enhanced to meet changing business and regulatory requirements.    

Here are three real-world examples of how operational data hubs can be used to solve complex data integration challenges in three sectors: investment banking, pharmaceuticals and insurance.  

The case of investment banking

Banks need to ensure their trade data is high quality, accessible and searchable in order to mitigate risk and maintain compliance with regulatory imperatives, such as MiFID II and Dodd-Frank.  

A leading investment bank built an operational data hub using MarkLogic’s database. This provides a single unified view of derivatives trades and allows a full audit trail for auditors and regulators. It replaced 20 Sybase databases with a single database, making trade information retrievable as well as actionable in real-time. As well as enhancing compliance reporting, it has dramatically reduced maintenance costs as the system is built on a commodity scale-out architecture. The result is a lower cost per trade. The bank can now develop and deploy new software – and therefore launch new products in response to the market – much more quickly.   

Combining technical innovation and banking insights, another leading investment bank working with the MarkLogic database to deploy a Trade Repository recognised the importance of determining what was known at any particular point in time. As a result of these discussions, a new and increasingly important feature called Bitemporal data management was developed. MarkLogic’s Bitemporal capability allows banks to minimise risk through “tech time travel”—time stamping and rewinding documents to identify changes by looking at data as it was over the course of time without having to reload data backups. This is critical to maintain and demonstrate compliance with, for example, Dodd-Frank data analysis processes.    

The case of the pharmaceutical industry  

Pharmaceutical firms are scrabbling to get their data in order to comply with IDMP - the new ISO standard for the identification of medicinal products - and submit their data about medicines to the European Medicines Agency, including information on how they should be used, consumed and packaged.  

The case gets more complex as much of this information currently lies in multiple, unconnected data silos such as billing, clinical, manufacturing and supply chain management systems.  

Building an Operational Data Hub, using an Enterprise-grade NoSQL database with built-in semantics, to integrate all of these data silos is a far more cost-effective way to identify and collect all these data sources. Once this 360-degree single view is enabled, it is easy to find the data needed for IDMP compliance - and to see commonality between data sets. This is important for producing safety summaries from data captured on drug trials, and actual data once the drugs are licensed, for example.   

The case of the insurance industry

Detection rates for insurers remain frustratingly low and most fraud is only noticed long after the crime has been committed and it is too expensive to claim lost monies back.  

Current rates of fraud detection relying solely on human expertise are at best 10 per cent, and are often far lower. By using an operational data hub, insurers can take advantage of the power of big data, semantics and inference to detect previously unknown fraudulent behaviour. With a 360-degree view of data, it becomes possible to evaluate the claim and its context, comparing the claim with other similar transactions and previous claims in order to identify patterns. 

By analysing data, assigning a risk score to each claim, alerting the right people in real-time and delaying payment to settle all high scoring suspicious claims, insurers will be in a position to turn the 10 per cent detection rate on its head.  

The summing up: Data strategy breakthrough

Regardless of format or source, the devil is in the detail and a company’s ability to delve into its data is a crucial business differentiator. An operational data hub is the perfect flexible data platform to solve today’s complex data integration challenges. It allows businesses to take advantage of the power of big data, semantics and inference to gain those crucial and subtle insights, as well as ensuring relief from the international complexity of ever-changing regulatory compliance requirements. 

Clichés aside - with an operational data hub, data detection is Elementary.

David Northmore, VP of EMEA, MarkLogic
Image source: Shutterstock/alexskopje

David Northmore
David joined MarkLogic in 2012 as Sales Director for UK & Ireland and was promoted to Regional Director of Northern Europe before becoming VP of MarkLogic EMEA in early 2016.