Compared to their on-premise predecessors, SaaS solutions seem almost too good to be true. Easy to install, scale, and relatively cheap to buy, they optimize individual business functions so well that most departments are able to procure & use systems of their choice.
But this market rise has had severe unintended consequences. Data is now spreading like wildfire. Each system contains its own dataset, which is stored in different formats making data a challenge to combine and trust.
How can teams get a unified view of data, and use it to ensure customers get the cohesive experience they expect? How can executives glean which of their investments are delivering return, and which boost customer satisfaction? How do business analysts avoid having to spend days and weeks preparing data before they can arrive at meaningful analytics for the business? How do companies investing in corporate dashboards ensure there is trusted, consolidated dataset to feed into these views?
Fusing disparate datasets into a unified, trusted dataset is often unwieldy. Below are five key challenges that could stand in your way, along with practical tips to overcome them.
1. Challenge: Duplicate Data
Eliminating duplicates takes time and resources. Duplicate data brings the risk of the consolidated dataset being inaccurate if like accounts or contacts aren’t consolidated into unique records.
Tip: We’ve found a two-step process to work well here. First deduplicate within a given silo, to ensure your applications don’t have duplicate data within them. Then, when combining datasets, connect like-records across systems. If duplicate clean-up work is needed within a specific application, then load non-duplicate data and flag duplicates for clean-up in their originating systems.
2. Challenge: Conflicting Data
One of the greatest benefits of SaaS solutions is that many users and business processes are contributing to a database powering the application. A reality of this though is that different applications will end up with different data on the same customers. If a customer is recorded in two different systems as having two different accounts, knowing which is correct blocks the path of your analysis with undue obstacles. Particularly since resolving such conflicts “by hand” is very impractical. Just a single update can spangle multiple rows, tables, even databases, with conflicts.
Tip: There are two automated approaches to consider for resolving conflicts in your data exist, “System of Record” (SoR) and “Last Modified”.
SoR prioritizes ranking of systems to determine -- for a specific type of data -- which system should win out in a conflict. For example, your CRM could be the SoR for any data related to a sales opportunity.
“Last modified” means your record uses the most recently updated data across systems, for a given field. So if a customer’s phone number is different in your CRM vs. your support system, the most recently updated field would get used.
You may use one approach, or some combination of both, to avoid data conflicts.
3. Challenge: Inconsistent Formats
Whereas conflicts put the accuracy of your data in jeopardy, inconsistent formats pit values against one another. That is, even if data are correct, one SaaS system may format your dates as DD-MM-YYYY, another as YYYY-MM-DD. The information is correct, but a pigsty to query. From phone numbers and states, to booleans and capitalization, when imposing consistent standards on your data, there are an endless number of fields for which you could update formats.
Tip: Create consistency by standardizing data into a common format. This will speed up comparison processes, as a database needn’t check many formats (e.g. “New Hampshire”, “NH”, or “N.H.”) against one another at ready time.
When you make rules about which formats will be the canonical standard for each entity type (e.g. states), establish criteria for abbreviations, acronyms, casing, and order matching. Through the act of eliminating inconsistencies, data quality improve, querying becomes faster and analysis more reliable.
4. Challenge: Critical Data on Related Objects
When SaaS solutions get built and deployed in isolation their related objects may differ substantially. Related objects cover a range of data tied to a given contact, including their account, opportunities, activities across all departments, support tickets and more. Much of this related data is lost upon the act of extracting data, compromising the completeness of consolidated datasets.
Tip: Match records on common identifiers for identifying and non-identifying fields. To match a Contact record, for instance, start with an email address, as this common identifier provides the highest probability for a unique match across systems. Multi-level de-duplicating keys should incorporate additional supporting data such as name, company and address. Whatever common identifiers you do use, map related objects so that you end up with a complete standard data schema to power your analytics.
5. Challenge: Continuously Updating Data in Source Applications
If you apply processes to address steps 1-4, one more challenge remains: data continuously updates, which means the consolidated data sets you manufacture could become obsolete because some of the source data has changed. Keeping data continuously updated can be a pain. The moment connected data sources get out of sync, the data you use to feed dashboards and other business intelligence tools is less reliable for reports. Querying siloed systems for the most recent data is tedious to repeat every time data inputs change. Analyzing multiple data sets, drawing conclusions, and dispensing recommendations to the rest of your organization -- are a better use of your time than worrying about data concurrency.
Tip: An automated data pipeline can connect the dots between the data sitting in your applications and your centralized database for analytics, bridging the gaps between your applications and analytics. In these situations, your data will be updated in near real-time, anywhere from every 5 minutes to every 24 hours.
Consolidated, unified data -- what we at Bedrock Data call fused data -- saves you time in data prep and gives you a data resource you can trust. This ensures all customer records are available through a trusted, centralized resource, while also allowing the individual SaaS applications to continue to perform their important business functions. A SaaS win-win.
Taylor Barstow, CEO & Co-Founder of Bedrock Data
Image Credit: Wright Studio / Shutterstock