Semantic Search and Question Answering in Enterprise

Learn how NLP enables a new kind of enterprise search applications

Semantic search at a glance

For decades, semantics were foreign to computers. For instance, if you wanted to teach a computer that two words were synonyms, you had to convey that information via hard-coded rules. Modern search systems, however, don’t need rules. Without being explicitly taught how to do it, they understand the semantic relationships between the following sentences almost as well as humans do:

  1. The cat seems happy.
  2. This kitten looks entertained.
  3. Antoine likes ice cream.

Nobody has to tell you that sentence one is similar to sentence two, even though they don’t share any words — and that both, in turn, are very different from the third sentence. That’s because when we evaluate the content of a sentence, we try to grasp its meaning as a whole.

Semantic search emulates that capacity by comparing documents on the basis of semantic similarity. It excels at that thanks to modern neural network architectures — so-called “Transformer” models. These latest pre-trained language models take a semantic approach to interpreting language, mimicking the way a human nerve transmits information by wiring together millions of interconnected “neurons.” Through a training algorithm, the Transformer network learns language by being exposed to a large volume of data.

What is “question answering”?

In the past few years, question answering has become one of the most common technical use cases for NLP. It is often integrated now into enterprise search applications to deliver a Google-like experience for the end-users. Still, the concept is obscure to many.

People ask questions to better understand the world around them. Whether it’s about trying to learn more about a person or a topic, posing a question allows us to get the information that we need to form a clearer picture. In the context of “information retrieval” from a collection of documents, you can similarly use full questions to better understand your textual data or to extract specific information from that data.

Automated systems for question answering (QA) have recently undergone a dramatic overhaul. Thanks to the Transformer-based language models and semantic search technology, these systems are now able to accept a user’s query in natural language and extract an accurate and informative answer from a large collection of documents — all within a fraction of a second.

Textual data accumulates non-stop, and it’s increasingly important to be able to make sense of that data through an easy-to-use natural language interface. Boosting text analysis, mining reports, implementing an internal QA system for your employees, using QA to create new metadata, or revamping customer service applications are just some examples of how a QA system can be used.

Building NLP applications, not “models”

Despite the popularity of the Transformer models, building production-ready enterprise solutions on top of them can be complex. Common deep-learning frameworks are hard to use for non-researchers, and also are hard to integrate with modern application architectures.

Many teams involved in the implementation of NLP-driven products are still primarily focusing on solving the “model problem.” This leads to a very long research-intensive cycle of building a proprietary dataset, training a model, evaluating its performance, etc. Instead, this kind of “undifferentiated heavy lifting” can be easily alleviated by picking a readily available pre-trained, open source Transformer model and iterating fast.

In order for the enterprise product teams to become successful, a different approach to application-level NLP is required — the one most closely aligned with the modern development lifecycle. There is a way to implement NLP solutions in agile-like, fast iterations and short-time-to-value manner.

Our open source Haystack NLP framework and the deepset Cloud NLP application development platform have been successfully utilized by many teams to build top-notch NLP solutions. By combining the power of pipeline-oriented architecture, Transformer-based language models, and developer-friendly tooling, Haystack has become the technology of choice for many enterprises.

In turn, deepset Cloud offers a fully managed platform that helps enterprise product teams build API-driven NLP backends. It offers all the tools needed for populating the document store, prototyping and evaluating pipelines, deployment, and collecting user feedback — all while following best practices from the modern software development and MLOps lifecycle. By leveraging Haystack’s most pragmatic technology stack, deepset Cloud offers all the required components to quickly build a variety of semantic search systems that could be easily integrated with modern enterprise applications.

For more information, please visit our website