Skip to main content

How AI can enable smart and savvy content management

(Image credit: Image Credit: Geralt / Pixabay)

Many modern organisations have considered deploying Artificial Intelligence (AI) to address specific challenges or improve particular functions within their organisation. A number of them are already on their own AI journey, and the ROI and benefits of AI in their businesses are becoming clearer and more tangible.

One area of the enterprise IT where AI is having an increasing impact on is in information management. For the most part, information management-related AI use cases have mostly focused on either simple classification of content as part of the capture or ingestion process, or as a more advanced and intelligent optical character recognition (OCR).

These provide excellent value but miss a major opportunity, which is to use AI on the massive volumes of content and data that already exist within most organisations - the ‘digital landfill’ - and to get true business value from that information.

There are four main ways in which AI can help enable smart and savvy content management:

1. Metadata enrichment

From an information management viewpoint, arguably the most important type of information is metadata - or information about information. Within old-school document management and enterprise content management (ECM) systems, each document stored became the focal point for processes such as invoice processing, claims management, and so on.  And, each of those documents had a limited and fixed set of metadata attributes (or tags) associated with it.

Changing metadata schemas required a lot of development work along with mass updates to all content related to that metadata. However, metadata schemas in an AI-infused Content Services Platform (CSP) are flexible and extensible. In addition, much more metadata is being stored and used than ever before - image resolutions, language of a document, geophysical data, and more.

This increased capability and the ability to utilise metadata much more effectively is a distinct benefit of a modern CSP over a traditional document management and ECM solutions - but what about the content stored in those legacy solutions?

Another powerful feature of a CSP is that it can connect to content from multiple systems, whether on-premises or in the cloud. This ensures the content itself is left in-place but access is still provided to that content and data from the CSP. It also provides the ability for legacy content to make use of a modern metadata schema from the CSP - effectively enriching legacy content with metadata properties without making any changes to the legacy system at all. This is hugely powerful - especially when combined with AI so that this process is automated.

Imagine this scenario: you have a legacy ECM repository containing customer documents. Despite the best intentions of your staff, the only relevant metadata attributes associated with contracts are customer reference numbers. By using a CSP to pass that content through an AI enrichment engine, you can potentially enrich that content with additional metadata attributes for each and every one of the files currently stored -- such as the contract expiration date, which then can automatically initiate a workflow to ensure the internal contract owner follows-up before this date. As you can see, this injects more context, intelligence, and insight into your information management ecosystem.

The AI engine could identify:

  • The type of each document - contract, correspondence, invoice, etc
  • Documents containing personal Information (PII), which then may automatically initiate additional security controls and provisions per privacy policies or regulations.
  • Documents that should be managed per record and retention policies.

2. Identification of important content and data

A key part of enriching metadata is that ability to ascertain ‘what is what.’ There are many uses for this, from simply being able to identify a document as a presentation, brochure, contract, invoice, etc. This capability is a core facet of knowledge management, but without good metadata on the content, then this is simply not possible.

But beyond that, many industries have strict compliance regulations that require different types of documents and records to be kept for a specific period of time - retention policies or rules. There were typically two ways to do this in the past - manually, or not at all. The manual approach was tedious, error-prone, and time-consuming - which led to a lot of organisations adopting the ‘not at all’ approach.

But by using an AI-driven engine to classify content stored within legacy systems, this becomes much easier to do. Even simple AI tools can identify the difference between a contract and a resume, but advanced engines expand this principle to build AI models based on content specific to an organisation. These will deliver much more detailed classifications than could ever be possible with generic classification.

3. Getting rid of unwanted information and content

The “keep it all just in case” approach not only exacerbates the digital landfill effect, but also means that a lot of information that could (and often should) have been destroyed, isn’t. Aside from the cost of having to store this content ad-infinitum, there are significant legal issues that arise from keeping information longer than you need to. AI can be used to help mitigate this problem significantly.

Part of the challenge of managing records, or even simply applying retention policies, is the sheer volume of content that needs to be managed. And the only way to go through this in the past was document by document. Due to the legal ramifications of incorrectly declaring (or not) a record, there is a desire to still include a human interaction (or checkpoint) as part of this process in most organisations. 

By using AI-classification of content with a CSP, it is possible at a massive scale to quickly and easily determine what is NOT a record. According to numerous research studies, the significant majority of content stored is ROT (redundant, trivial, or obsolete) - so by clearing out huge chunks of that ROT, the task of identifying relevant content to apply retention policies to become much, much easier.

AI can then be used on the remaining content to identify the type of content in more detail, match that to the retention rules, and then make recommendations to the relevant staff members. This makes the whole process of identifying, declaring and managing records incredibly straightforward, much more scalable than before, and much more cost effective.

4. Using machine learning models on your own data

But what really makes a difference when using AI is the ability for an organisation to train and deploy their own custom AI models. When an enterprise works with their own data to train AI models completely tailored to the unique needs of their business, it means the AI engine can provide more accurate data about the document or asset, and then extract this information and apply it as metadata.

Metadata attributes and tags allow a user to help find and retrieve content, but this data is often added by humans, which is both error prone and limiting. Automated entity extraction using business-driven AI presents another level of value to the business by delivering more attributes, with greater accuracy, and at a much faster pace than ever before. The key here is that the metadata attributes are specific to the business, which for example can then deliver tags with detailed product names, part numbers, customer accounts, etc., which instantly provides more value. This in turn drives applications such as automated image and content capture; automated launching of workflows and related business processes; even associating new content or assets with pending tasks or work assignments.

To deploy machine-learning models and to train them using your own specific data sets is a powerful proposition indeed. It takes AI in information management to the next level and enables organisations to begin to operate truly smart and savvy content management.

Dave Jones, VP of Product Marketing, Nuxeo
Image Credit: Geralt / Pixabay