Skip to main content

Enterprise investments in NLP are increasing, but accuracy and cost still impact implementation

(Image credit: Image source: Shutterstock/polkadot_photo)

The Natural Language Processing (NLP) market size is expected to grow from $10.2 billion in 2019 to $26.4 billion by 2024, according to research from MarketsandMarkets™. The major growth factors of the NLP market include the increase in smart device usage, the adoption of cloud-based solutions, and NLP-based applications to improve customer service, as well as the increases in the healthcare industry.

With sights set on the future of NLP, there’s a lot to look forward to - improved patient care, better customer service, and simply reaping the benefits of technology we don’t even know is there to help us complete tasks faster - regardless of what industry you’re in. But we can’t get to the promised future of NLP unless we understand what’s going on in the present. To achieve this, John Snow Labs commissioned the first NLP Industry Survey, exploring how enterprise companies are currently using NLP technologies. The survey was conducted by Gradient Flow, an independent data science analysis and insights provider.

The global survey - which queried nearly 600 respondents from more than 50 countries - gives a comprehensive view into the 2020 state of NLP adoption and implementation by considering several important contrasts. This includes analyzing organizations with years of history deploying NLP applications in production compared to those which are just exploring NLP, technical leaders compared to general respondents, company size and location, and scale of documents.

By exploring current practices, challenges, and triumphs of NLP, we can begin to understand the state of enterprise adoption and implementation, and how to unleash NLP’s full growth potential in the coming years.

Spark NLP reigns supreme

You’re only as good as the tools you use, right? Half of all respondents (53 percent) used at least one of the top two libraries: Spark NLP and spaCy. More specifically, a third of all respondents stated they use Spark NLP, making it the most popular NLP library in the survey. A quarter of all respondents stated they use spaCy and AllenNLP, a newer PyTorch-based library for NLP research, secured the spot as the third most popular library. The most popular libraries varied slightly in several key industry sectors: Healthcare (Spark NLP), Technology (spaCy), Financial Services (nltk).

When evaluating the suitability of NLP libraries, it's important to understand that they typically provide pipelines in which machine learning models get applied at each stage. This means users must gauge the effectiveness of a multi-stage pipeline for a given application. The recent use of deep learning in these models used in NLP pipeline stages has dramatically increased accuracy of these libraries overall, enabling a much broader range of business use cases. Speaking of accuracy...

Accuracy is king

More than 40 percent of all respondents cited accuracy as the most important criteria they use to evaluate NLP libraries. In turn, a quarter of respondents cited accuracy as the main criteria they used when evaluating NLP cloud services. Accuracy refers to pre-trained models that get used in multi-stage pipelines in NLP libraries. These models let users input text to get common outputs, but customizing models can present challenges.

For example, language is very application- and domain-specific, making it especially painful when a model is trained for general uses of words, but does not understand how to recognize or disambiguate terms-of-art for a specific domain. In this case, speech-to-text services for video transcripts from a DevOps conference might identify the word “doctor” for the name “Docker,” which degrades the accuracy of the technology. Just imagine how detrimental this could be when evaluating data from electronic health records or legal documents?

Cloud services come with challenges

77 percent of all survey respondents indicated that they use
at least one of the four NLP cloud services listed in the survey (Google, AWS, Azure, IBM), with Google’s service topping the list. Google Cloud is particularly popular among respondents who are still in the early stages of adopting NLP, but cloud usage rates drop slightly when looking at companies that have more experience in deploying NLP.

That said, 65 percent of respondents working at companies further along the NLP adoption curve still use at least one of the NLP cloud services. Despite the popularity of cloud services, respondents cited cost as the key challenge here. There are also concerns about extensibility, since so many NLP applications depend on domain-specific language use and cloud providers have been slow to service these market needs.

The data and use cases feeding NLP

In terms of feeding the NLP beast, data from files and databases top the list of data sources used to provide NLP projects’ sustenance. 61 percent of technical leaders surveyed stated that they used files - pdf, txt, docx, etc. - for their NLP systems. More than a third (36 percent) of this group also indicated that their organization used a text annotation tool for labeling training data for NLP.                            

Once fed the data, it’s time to put NLP to use, and the four most popular applications for this are Document Classification, Named Entity Recognition (NER), Sentiment Analysis, and Knowledge Graphs. Document Classification and NER are by far the most popular use cases among respondents who worked in organizations further along the NLP adoption curve. Respondents from healthcare cited de-identification (38 percent) as another common NLP use case - which prior to being automated by NLP, had been a manual and labor-intensive process. This is especially important due to privacy regulations that require healthcare users to strip medical records of any protected health information.

So, what’s next for NLP growth?

While it’s no surprise that NLP is on the rise, it was surprising to see the level of investment in the technology - especially considering when the survey was commissioned. From July 5 to August 14, in the thick of the global Covid-19 pandemic, respondents still indicated spending was increasing consistently, and in many cases, significantly. In fact, 53 percent of respondents who are technical leaders indicated their NLP budget was at least 10 percent higher compared to 2019, with 31 percent stating their budget was at least 30 percent higher than the previous year. The same trend applies to large companies (those with more than 5,000 employees), in which 61 percent of respondents cited budget increases in 2020.

These findings are especially encouraging, given worldwide IT spending is projected to decline 8 percent from 2019, according to the latest forecast by Gartner, Inc. The coronavirus pandemic and effects of the global economic recession are causing technology leaders to prioritize spending on mission-critical initiatives as opposed to those aimed at growth or transformation. Fortunately, investment in NLP checks all three boxes.

David Talby, CTO, John Snow Labs (opens in new tab)

David Talby, PhD, is the CTO at John Snow Labs, an award-winning AI and NLP company, and developer of the Spark NLP library.