The GDPR deadline is imminent and for large organisations struggling to ensure that all Personal Data is identified, it is too late now to employ consultants or redirect staff away from their normal tasks to work on that for GDPR compliance. Instead, it is time to deploy software tools to automate as much of the Personal Data discovery work as possible and be ready to deploy them to ensure compliance post 25 May 2018.
To illustrate the scale of the Personal Data Discovery challenge, Silwood Technology recently conducted research into five of the largest and most widely used information application packages, to showcase just how complex the seemingly simple task of locating and categorising Personal Data held across an enterprise can be.
The research revealed that in SAP there are more than 900,000 fields, JD Edwards has more than 140,000 fields and there are 100,000 plus fields in Microsoft Dynamics AX 2012 – all of which may (or may not) contain personal information that requires detection and risk assessment.
Many organisations have been addressing the Personal Data challenge through software such as Information or Data Catalogue solutions within their overall Governance or Compliance programme. Those solutions often incorporate some form of scanner or crawler that connects to many sources, identifies the metadata and imports it automatically.
Others have been using spreadsheet or more home-grown solutions to try to record Personal Data locations and understand how data flows through their organisation. While these solutions can be effective for some IT systems, they will not be as successful for organisations running enterprise CRM or ERP applications from SAP, Oracle, Salesforce, Microsoft or other large application packages.
This is due to the size, complexity and level of customisation of the underlying data landscape of these systems, which was something that Silwood Technology’s research highlighted, along with the need for those solutions to incorporate specialist discovery software designed for the task of locating Personal Data and understanding how it was being used and stored by an organisation.
There are eight main approaches for organisations identifying Personal Data for and after GDPR and, while many are time consuming and rely on extensive manual interrogation of databases and systems. Here we explore the options to finding all Personal Data:
1. Searching for documentation
While this may seem the natural first port of call when trying to locate Personal Data items in an application, if the data models do exist in this static way, they will only be of limited use in anything but smaller, perhaps home-grown applications with simple data structures.
For those with large scale ERP and CRM packages, the task of navigating documentation to find individual tables and attributes from amongst thousands will be a more significant challenge and any useful information cannot be as easily shared with other tools as re-keying will be required.
2. Manual investigation
Typically involving someone tasked with scouring the relational database (RDBMS) system catalogue for any information which might provide clues as to what data the tables contain, manual investigation is a perfectly acceptable approach for small database systems. This is because the package is limited in scope or has been developed in-house, so finding what attributes and fields they include and crucially the relationships between tables is not as labour-intensive as it would be in larger systems with a great many tables which do not have useful business names or descriptions.
3. Employing application or technical tool specialists
Specialists are likely to have the most familiarity with the application and its underlying data model, and access to any technical tools provided by software vendors to use when locating the required information. Their knowledge of the business context of a request for Personal Data may not necessarily be complete though, and such specialists are often in very short supply and extremely busy, so not always available as soon as you need them.
4. Sourcing external consultants
Hiring in external consultants is another common approach. While they may provide an expert resource, there can also be a significant cost involved, in addition to the time required for them to familiarise themselves with the data landscape and its customisations. These can contribute to lower in-house knowledge levels in the long run.
5. A metadata-driven software approach
Using software to identify the metadata associated with Personal Data across an organisation’s IT ecosystem can make the discovery process considerably faster and more effective. Many data catalogue and governance products have facilities to connect to source systems and import their metadata directly so that it can be investigated more fully. Automating this process reduces the opportunity for error as there is only very limited manual intervention.
This approach does not work for large CRM and ERP systems because of the size, complexity and level of customisation of their data landscapes. There are a few advanced self-service metadata discovery tools, such as Silwood Technology’s Safyr®, which provide a view into their metadata and allow users to navigate and search for Personal Data attributes and subset them into appropriate categories. That information can then be shared with Data Catalogue or Governance products or even used with Excel.
Metadata-based solutions can accelerate Personal Data discovery considerably, especially when compared to entirely manual or semi-automated processes.
6. Using the Internet
An Internet search to locate Personal Data attributes is only really of any value when the data models are in a format that can be published either by vendors or customers. It would not make much sense to publicly exhibit data models of one’s own in-house developed system.
However, it is possible to find metadata definitions for well-known social media platforms, as an example, and occasionally data models from popular ERP and CRM packages which might point you in the right direction of the Personal Data you are seeking. This is often seen as a viable, low cost option, but is labour intensive and also questionable in accuracy terms.
There are also risks. The published information is unlikely to represent the system as implemented by the seeker either through version differences or individual customisations. In addition, it is often necessary to ask a technical specialist to interpret the model and augment it with relevant information from the application itself.
7. Guesswork and hypothesis testing
When faced with the problem of Personal Data discovery, many companies use best guess or hypothesis testing methods to try to find tables and attributes that they need. Relying on data observation, insight and trying to find an appropriate start point from which to launch a search is a strategy that can be frustrating, time consuming and potentially inaccurate though.
8. Deploying data modelling or profiling software
Data Modelling tools offer a good solution for finding Personal Data based on their ability to reverse engineer RDBMS and create a data model from the tables, fields and relationships they find. From there an analyst can try to find the items needed for GDPR.
Data Profiling software can also be useful as it provides the ability to look at data formats to determine if they are likely to contain Personal Data. Sometimes this uses a form of machine learning or other analysis techniques to surface what may be relevant.
ERP and CRM package vendors do have tools which can be used by technical specialists for more traditional database and metadata tasks.
However, the particular challenge of trying to locate Personal Data in large packaged CRM and ERP applications is not adequately met using these approaches. This is because of the lack of meaningful metadata in their database schema, the size and complexity of the model and the numbers of attributes to be investigated.
GDPR compliance is not a one-time event and maintaining compliance will be an essential business process so whether it is a manual, intensive approach today or turning to software vendors or external consultants, don’t put Personal Data discovery off today or you may end up facing an inevitable data reckoning sometime soon.
Roland Bullivant, Sales and Marketing Director, Silwood Technology
Image source: Shutterstock/alexskopje