To date, a vast amount has been invested deploying traditional recognition technologies such OCR, ICR and intelligent word recognition to analyze the content of documents and boost automation. It’s still very much a growth area. Research shows that the global OCR market is expected to reach $13.38 billion by 2025 – increasing at a CAGR of 13.7 percent from 2019.
Don’t panic! You might recall the famous inscription on the cover of The Hitch Hiker’s Guide to the Galaxy, a classic sci-fi book written by the late Douglas Adams which charts the adventures of Arthur Dent and Ford Prefect after Vogon’s demolish Earth to make way for a new hyperspace bypass.
Published in 1979, what might not be quite so well-known, is just how prescient Adams was in terms of referencing technology which has subsequently been developed. The fictitious Hitch Hiker’s Guide itself was almost a precursor to the Kindle – a handheld electronic book able to serve a million pages via a four-inch square screen. The information stored in it is user-generated and constantly updated – exactly the approach adopted by Wikipedia - and the Babel Fish introduced the idea of putting something in your ear which could then translate languages – a concept actually brought to market in 2016 by Waverley Labs with the Pilot Smart Earbuds.
And talking of Babel Fish, rapid developments involving self-learning artificial intelligence platforms [AI] – which solve complex problems automatically - are now enabling information and business managers to quickly gain real insight from documents irrespective of the language, the computer file format used and whether documents contain machine print or cursive handwriting or both.
This is radically set to change how organizations cope with recognizing and classifying millions of documents and then extracting and validating information without any manual intervention at all, thereby increasing productivity, accuracy and saving money.
Despite this, there are limitations. Many ICR/OCR engines struggle to process a mix of documents – encompassing structured, semi-structured and unstructured data – along with cursive handwriting, historical and old documents especially when the legibility of the paperwork is poor. The situation is exacerbated when volumes are high. And no one traditional ICR/OCR engine can seamlessly process a variety of languages – jumping from documents in English to Chinese, German and so on.
With such variability, correct read-rates drop markedly – it’s still tough to get more than 90-95 percent accuracy today - such that staff are required to then manually rekey information in. This is time consuming, costly and begs the question of whether enough trained employees are available to do it.
Of course, crowd-sourcing approaches are a good and cheaper work around than actually hiring people to enhance accuracy. Snippets of data are sent to online entry clerks logged into an Internet-based system who then check it prior to inputting it into line of business systems.
But the promise - and now reality - of AI is that these challenges are also resolved using powerful cognitive systems.
Utilizing neural networks, AI-driven document processing platforms offer a leapfrog advance over traditional recognition technologies. At the outset, a system is ‘trained’ so that a consolidated core knowledge base is created about a particular (spoken) language, form and/or document type. In AI jargon, this is known as the ‘inference’. This knowledge base then expands and grows over time as more and more information is fed into the system and it self-learns – able to recognize documents and their contents as they arrive.
This is achieved given a feedback ‘re-training loop’ is used - think of it as supervised learning overseen by a human - whereby errors in the system are corrected when they arise so that the inference (and the meta data underlying it) updates, learns and is able to then deal with similar situations on its own when they next appear.
It’s not dissimilar to how the human brain works and children learn a language. In other words, the more kids talk, make mistakes and are corrected, the better they get at speaking. The same is true with AI when applied to document analysis and processing. The inference becomes ever more knowledgeable and accurate.
AI-based systems can be trained to automatically recognize specific forms, review specific content and its layout on the page and then convert cursive handwriting into standard electronic formats such as PDF or JSON for analysis or workflow purposes with validation and verification also taking place. This can also be done at a field-based level so that key value extraction can be completed. Admittedly this something that ICR/OCR systems can also do but they struggle to recognize cursive handwriting and require complex algorithms to find the fields.
Key value extraction on a form, for example, could be a generic box for ‘name’ or ‘age’ – the key – and then the specific values would be ‘Mr John Smith’ and ‘50’. Or on an invoice, the keys are items purchased and the values are the prices paid for each different one.
The benefits here are clear. Governments, healthcare providers, banks and insurance firms have to process a vast number of handwritten forms with identical formats for various purposes like questionnaires, applications, personal loans, mortgages or claims. Retrieving the handwritten information out of them and converting it into a digital format without human intervention reduces manual errors, lowers cost, allows big data analytics and makes turn around considerably faster.
And the speed of this AI-based processing is impressive. Anywhere up to 50,000 pages per hour can be completed using a single server – with bigger deployments and cloud delivery also possible when more compute power is added.
A reliable option
Loads of different common files can be ingested for analysis such as plain text, PDF, TIFF, JPEG, GIF, PPM, PNG and so on with several neural nets then reading the text and classifying the type – whether it be handwriting or machine print – with ‘fuzzy search’ aiding the text to digital conversion process. And class-leading AI systems - in addition to handling paper documents - are designed to cope with pictures, video and audio, too. Put another way, they are content agnostic and can handle any source content.
This is real stuff. One German insurance firm is working over the next six years to shift its entire claim process to use an AI-powered system such that claims under a certain value will be handled automatically based on information extracted, assessed and approved from a form with no human involvement required at all. This will be accomplished as the AI solution automatically checks the name, address, insurance number and other key details about a given incident – capturing all the data from the form correctly first time every time.
When it comes to document processing, seeing AI in action is impressive. It’s ‘wow’ magical stuff to watch a machine ‘read’ a scanned paper document and extract data from it.
One of the consequences of the Covid-19 pandemic and the economic fallout from it is that many companies will want to improve efficiency in a bid to save money. Those who have a significant cost and operational overhead processing forms and other documentation many feel a sense of corporate anxiety or even alarm about how to do this.
As The Hitch Hiker’s Guide helpfully advised on its cover, don’t panic. AI has sufficiently matured such that it is now a real-world performant and reliable option for companies tasked with grappling and dealing with millions of paper documents.
Ashley Keil, VP sales, EMEA/APAC, IBML