Skip to main content

Why OCR fails and CMR scales - Mind the data digitization gap with cognitive machine reading

(Image credit: Image source: Shutterstock/alexskopje)

According to a recent survey from HFS Research, the number one issue preventing early adopters in achieving their automation goals is scalability. In fact, 75 percent of companies that have deployed automation, lacked an end-to-end integrated process view, plus analytical insights, so their automation program remains fragmented. Automation tools such as Robotic Process Automation (RPA) are used to automate certain business processes in specific departments, whilst other facets of the business are neglected.

In order to solve this issue, companies need to have the right automation strategy in place. This means putting adequate consideration into the appropriate automation tools needed to scale and achieve their business KPIs and ROIs.

This article explores how organizations can select the right technology platforms to become a scalable, integrated business.

Importance of data digitization: why CMR is superior to OCR

The idea of automating some of your business processes and improving your operational efficiency is enough for some C-level executives to adopt AI and deem themselves competitive. This approach to automation is simply inadequate. Business leaders need to be able to successfully scale their enterprise automation efforts companywide in a way that ensures the successful automation of every department.  Otherwise they will find themselves relying on an automation program that is huge fragmented and yields poor ROI. Every business executive’s goal for implementing AI should be to achieve fully autonomous and unified business operations.

The first step in any automation journey should be capturing, curating, and analyzing your organization’s data. Many organizations struggle with accessing and making sense of all the various data available to them. Business data, which comes in multiple forms and can vary widely depending on which part of the organization you sit in, requires different tools to digitize documents. The overarching issue here is that early adopters of automation make the mistake of only automating certain processes and neglecting others. This usually stems from a poor understanding and management of the various business data in use. This ultimately means companies can be limited to only automating the same data type in one or two departments compared to achieving end-to-end automation with various different data forms found throughout the company.

For instance, Optical Character Recognition (OCR), considered the industry’s traditional data ingestion platform, lifts structured data from documents but cannot read or ingest unstructured data (which comprises 80 percent of an organization’s data). Furthermore, inputting a document into an OCR doesn’t mean it will be captured accurately and requires manual intervention. This defeats the sole purpose of automation. OCR cannot ingest and process different types of business data available and falls down when it’s presented with anything other than structured data (i.e., fixed field text). This is less than ideal, leaving large portions of unstructured data untouched, resulting in certain departments within the businesses that rely on unstructured data being left behind in an organization’s digital transformation journey.

In stark contrast, when organizations leverage technology that can recognize, classify and extract large amounts of data from all kinds of documents, they can automate entire processes instead of just individual tasks. That means businesses and their stakeholders can benefit from straight-through processing. The key is to really hone in on processing clean data and digitizing it so that it is automation ready. Every business task within an organization requires the digitization of clean data. Essentially, this type of data will be able to flow through the entire organization via an end-to-end process rapidly and efficiently.

Intelligent document processing platforms that leverage Cognitive Machine Reading (CMR) built on integrated AI capabilities can do all of this in one centralized hub with little training or interference. This is due to the integrated AI components that the CMR platforms run on, which includes complex computer vision technologies, handwriting recognition and signature verification, that can identify, and read images as well as handwritten documents. (This was one of the challenges faced by a global insurance provider.) With AntWorks CMR solution, this insurer realized a more than 65 percent increase in accuracy for handwritten recognition and 75 percent reduction in manual processing. And over time, the Machine Learning in CMR increases accuracy percentage, accelerating productivity and faster time to ROI.

With OCR not capturing that much data and quoted at achieving 80 percent accuracy (that accuracy rating only applies to formats and documents it recognizes), it’s no wonder scalability eludes organizations. Enter CMR. CMR’s approach to automation gives enterprises a competitive edge because it:

  • Ingests and processes all data types, including structured data in the form of fixed field text and unstructured data types such as emails, images, videos, and handwritten text hence why it is able to transcend the challenge of digitizing unstructured data.
  • Doesn’t require new templates to be created each time it needs to ingest and process new data. OCR does. That can be an incredibly tedious and timely process requiring templates to be created on a regular basis. CMR is not dependent on template creations and can therefore ingest and digitize data irrespective of its variance or format.
  • Decodes and digitizes special characters in multiple different languages through pattern recognition. OCR cannot find the meaning behind data in different languages unless there is some form of human interference by way of labelling or manually reviewing the content first.
  • Helps users to scan, filter and identify specific information needed within a document as long as the parameters for the information needed is set. OCR is unable to localize and contextualize specific information.

With CMR you can not only expand your automation scope, but also get faster ROI, increased data certainty and continuous improvement when it comes to optimizing the automation of business processes. There is no company out there that doesn’t stand to benefit from AI and automation. This is now a time where it has never been more important for companies to leverage their data to not only make employees and processes more efficient but to continue their operations and delivering services in a post-Covid world.

Asheesh Mehra, CEO and Co-founder, AntWorks