Data quality should be a highly celebrated and nurtured initiative inside an organisation. However, it is all-too-often overlooked and ignored, creating a dangerous path. Ignored and overlooked data can cause chaos in and out of organisations. According to Forrester, 70 per cent of organisations feel poor data or inconsistent data impacts their ability to make decisions. Furthermore, an Experian study found that 72 per cent of global organisations say data quality issues impact customer trust and perceptions.
Why does bad data happen?
Poor data can be caused by a variety on conditions and traditionally falls into four categories: data entry, processing, migration and data decay.
Data entry is a human condition. Bad data happens because of interactions with systems that require humans to enter information. These include typos, character transpositions, misspelling, inconsistent use of abbreviation, and inaccurately filling a data field (e.g.: 9999999’s, 123456) for a human to move to the next data capture field.
Processing issues occur due to the diversity and disparity of the data landscape. Long gone are the days of information being in operational relational databases that reside inside the organisation. Now data exists in data structures of multiple locations, in various formats, with different owners and standards for encoding information. The data might also have differing time intervals and capture methods.
Migration causes bad data when information is moved from one environment to another, and data transformations occur in the new system that is not shared with the original source system. Thus, the data between the two systems is misaligned, causing confusion and data mistrust. Additionally, when companies purchase new systems and copy data from the old to the new, inherit data problems are often not addressed and prompt similar challenges as before.
Mergers and acquisitions cause a series of business challenges. Data between the duplication systems naturally cause the data residing in these systems to be misaligned, inconsistent, and duplicated. Therefore, the organisation doesn’t know what the right data is for a decisioning function.
Data decay materialises when data that was initially appropriated for a specific use is no longer appropriate for that use—or when data being used for a decisioning processing is out of date due to latency.
Consequences of bad data
Expense. According to Harvard Business Review, the yearly cost of bad data is over $3 trillion annually in the US. In addition, Experian found that bad data has a direct impact on the bottom line of 88 per cent of all American companies and the average loss from bad data was approximately 12 per cent of overall revenue.
Customer perception. I can speak from personal experience that most organisations I do business with have no idea who I am. For example, I made a purchase with a large online retailer about six years ago for a baby shower gift. I have not purchase other ‘baby’ items from the retailer before the baby shower purchase, and to this day. However, I am still getting recommendation emails and online prompts for baby supplies. (Even if I had a baby six years ago, this child would no longer need ‘baby’ items.) Is this a data quality issue or a marketing problem? I argue the former. This demonstrates a bad data issue that causes marketing to push an incorrect recommendation. The information the organisation has in its system is old and should’ve been purged, updated or archived. It’s a marketing issue because their systems do not account for a one-time purchase that is an anomaly to the rest of the purchasing behaviour and this behaviour should be excluded from future algorithms. Hence it’s no surprise that 72 per cent of global organisations say data quality issues impact customer trust and perceptions.
Undermining the decision-making process. According to the Global Data Management Benchmark Report, 33 per cent of the C-Suite believe their organisation’s data is inaccurate and undermined their ability to make strategic decisions. This should not come as a surprise as only 31 per cent of companies claim to be data-driven, meaning close to 70 per cent still view data as a by product rather than a valuable corporate asset. Since data is not viewed as an asset in most organisations, it is neglected and overlooked, making the decision-making process inaccurate, conflicted and untrusted by everyone across the organisation.
Lower productivity. Bad data increases latency because the data community spends 80 per cent of their time trying to fix data issues and only 20 per cent of their time making informed decisions. Most organisations spend their time fixing data manually and/or writing code. This process slows the timeliness to decisioning, is not scalable or automated, and is traditional undocumented. This dysfunctional process causes data to be misalign, incorrect, inconsistent, and conflicted leading to lack of trust in the data which undermines the decision functions of the organisation.
What is needed for good data?
Having good data starts with a plan. Think of it like a trip to your favourite vacation destination: you must plan how to get there and decide what you want to do once you arrive. An itinerary helps you decide when you want to go and what constraints you must address. If you have children in school, you need to decide if you take them out of school for the trip or wait until they have time off. How long do you want to stay? How long can you afford to stay? Do you fly, drive, or take the train for the vacation?
A good data plan involves the same types of questions. You must put in place a plan to help identify what good data means to your organisation, how are we going to get there, and what you are going to do once you have it. Have an objective in place. For example: “we want to make sure our c-suite trusts the data they are using for strategic decisioning.”
The plan should be broken into action steps on how you will meet the objective. These steps must have measurable and time-orientated checkpoints in place to ensure you are still on the right path. For example, if your action step is to identify what data the c-suite is using and needs for their decision-making process, your check point is to set up meetings with report writers to identify what data is being used for executive reporting and what challenges they have with data (access, integration, accuracy, timeliness, etc.). Find out how they resolve these data challenges and what else they’d like to have to address latency within the process. Then, set up a deadline to complete these checkpoints.
Document findings from your checkpoints and share with the larger team. Make sure you connect your findings back to the objects and clearly outline next steps. Having the process documented keeps plans on track and will help reinforce business initiatives and the value it brings to the organisation. Documentation allows you to identify the use cases, actions need to be taken on the data and which tools to select in order to fix persistent issues.
At some point in this planning process, action must happen. Prioritise data quality processes based on the needs discovered in the steps above. Identify tools that are business friendly and encourage reusability, scalability, and shareability. These tools must address multiple data domains and languages within a single unified offering, and need to have metadata, lineage, impact analysis, and change data capture capabilities. And lastly, make sure the data quality functions can integrate higher in the transactional stack to apply data cleansing rules as the data enters the organisation. Remember to start small and then build upon the success.
Bad data doesn't stop
Because bad data continuously enters the organization, like a virus, there must be mechanisms in place to help prevent bad data from infecting the healthy data environment. Put data quality higher in the transactional stacks of the organization, so you can clean data at point of entry before it lands in your operational systems. When data does not meet business rules, then it must be put into a remediation que, ensuring that this data doesn’t infect the clean environment, so a manual or AI process can be taken on these unique data conditions with new rules added- if applicable.
Lastly, data health monitoring must to be established as part of the data quality program. This process monitors the health of the data across the data landscape and with published alters when data is not meeting certain thresholds, so a course of action can be taken on the data triggering the alter.
Value of good data
Good data provides the foundation for better decisions, improves productivity, enhances the customer experience and brings significant returns. A recent study by Forrester states that a “ten per cent increase in data accessibility will result in more than $65 million in additional income for a typical Fortune 1000 company.” I would venture to say that if that additional 10 per cent increase consists on quality data that figure will be significantly higher. From my experience most, organisations traditionally seeing a return on investment for their data management tools in 4-6 months period.
The value of good data far outweighs the value of poor data. With quality data, data consumers and corporate executives can make better decisions that positively impacts both your organisation and customers. Long live data quality!
Kim Kaluba, Senior Product Marketing Manager in Data Management, SAS (opens in new tab)
Image Credit: IT Pro Portal