Machine learning is all the rage in today’s analytical market. According to Kenneth Research, the value of machine learning is growing sharply and is expected to reach over $23B by 2023 – an annual growth rate of 43 percent between 2018-2023. IDC enforces this point predicting that worldwide spend on cognitive & AI systems, which includes machine learning, will reach $110B by 2024. Likewise, Gartner believes the business value machine learning and AI will create will be about $3.9T in 2022. With these kinds of predictions, it’s no surprise organizations want to incorporate these popular (and lucrative) methods into their analytical processes.
Machine learning for data preparation
Machine learning is not a new concept in the analytical lifecycle – data scientists have been using machine learning to help facilitate analytical processes and drive insights for decades. What is new is the use of machine learning for data preparation tasks to accelerate data processes and expedite analytical efforts. Here are four ways data preparation efforts can leverage machine learning for more effective and faster data reconditioning efforts:
1. Data transformation recommendations built into solutions suggest how data needs to be standardized and converted to meet analytical needs. This feature can proactively look at the quality of the data set and identify what quality transformation should be executed to ensure the data is ready for analytics. These recommendations are based on historical preparation tasks while using AI/machine learning to present new recommendations to the user.
2. Automated analytical partitioning applies AI/machine learning to determine the best way to partition the data for analytics. It also provides transparency on which method should be used and why. This helps speed up the analytical process because the data is automatically grouped together for training, validation and test buckets.
3. Smart matching incorporates AI/machine learning to proactively group like data elements together. Using the most effective matching discipline allows the user to decide if they want to automatically build a golden record and assign unique keys to the data.
4. Intelligent data assignment provides the data and analytics community quick understanding of the classification of the data type (e.g., name, address, product, sku), which allows simple tasks like gender assignment to be performed without user intervention. Data automatically populates a data catalog and uses natural language processing to explain the data, while contributing to the lineage for quick impact analysis.
Machine learning in action
The main objective of applying machine learning techniques to the data preparation process in innovative ways is to find hidden treasures in the data. These found treasures in the data can have a positive impact across many facets of business enterprises such as competitive advantage, regulation requirements, supply chain fulfillment and optimization, manufacturing health, medical insights, etc. To be specific, here is an exploration of how machine learning can impact a critical business initiative like fraud detection and prevention.
1. Unsupervised learning added to the fraud environment enables organizations to find edge cases in the data and proactively identify abnormal behaviors not found in traditional methods. These abnormal behaviors can be moved into a supervised learning process, like regression or classification analytics, to predict if these outliers are new types of fraudulent activities that require additional investigation.
2. Text analytics provide unique insights by disambiguating certain data attributes that numerical data can’t identify and therefore helping to identify unknown patterns between text and traditional data components. These insights may lead to new fraud patterns for consideration.
3. Hibernation can be used for smart alerting to apply a scoring model across all data - active and historical - to identify new fraud patterns that need attention. This process consolidates scores into one entity-level score for risk assessment and transaction monitoring, helping to identify new, out-of-threshold incidents for additional investigation.
4. Adding automated natural language processing (NLP) to the fraud mix provides human language translations to complex analytical findings, delivering the information in a way that humans can use and understand. Coupling NLP with image recognition helps identify document types using context analytics on text classifications, improving the accuracy rates of fraud detection.
5. Through dynamic ranking, more data is available for machine learning processes, resulting in more complete cluster analysis, identification of better risk predictors and elimination of false variables. Machine learning will teach itself about the normal data conditions and proactively monitor and update risk scores for more data-driven results.
6. Intelligent due diligence provides entity resolutions across product and business lines. Machine learning creates profiling for peer groupings and identifies expected behaviors using network and graph analytics. Because machine learning identifies expected behaviors, it can also point out unexpected behaviors that may indicate suspicious activities or a market shift that needs to be addressed.
7. Smart alerting takes traditional ‘alerting’ data and combines it with additional data to unearth new conditions that need to be investigated. With machine learning, the tools can teach themselves what alerts can be handled automatically and what alerts need a human eye. Intelligent detection optimizes existing detection models by including more data and AI/machine learning techniques to identify new scenarios using newly combined targeted subgroups to find additional detections or alerts for consideration.
In summary, the machine learning marketspace is exploding, bringing business value to organizations across all industries. Machine learning produces new insights and allows organizations to leverage more or all the data to make better and smarter decisions. So, let’s start speaking the new machine learning language of data and analytics today!
Kim Kaluba, Senior Manager for Data Management Solutions, SAS