Understanding Semantic Search

A timely overview of the landscape and jargon of semantic search and question answering systems.

The Haystack framework is centered on question answering (QA). The concept of “search” is a big part of Haystack’s functionality, but how does it tie in? Well, a QA pipeline’s task is to first find the right documents. It then goes on to find the correct answer or answers within those documents. Since our modern search and question answering systems leverage the latest generation of neural networks, they can be subsumed under the general term “neural search.”

Newcomers to neural search might find the language around question answering pipelines and search systems in general to be confusing or even intimidating. In this article, we explain key concepts and terms of neural search systems. In the process, we hope to help you find the flavor of search that’s right for you.

Simple Search Paradigms

Before there was neural search, there were much simpler methods, like exact string matching. String matching lets you filter for a certain term on a website or within a document. It’s still in use at many databases, like libraries or newspaper archives. Here’s an example of string matching in action:

Exact string matching will return all those documents that contain your search term, in no particular order. For a large text collection, this method won’t get you far. Imagine googling for a term and instead of feeding you the most relevant results, the search engine just returns all the websites that match your search terms in a random order. To increase the relevance of search results, we can use keyword matching. In keyword matching methods like tf-idf or BM25, the terms in a document are weighted according to their relevance within the corpus. These methods are fast and language-independent.

On the downside, keyword matching operates on a simplifying “bag of words” principle. This means that it doesn’t preserve the order of words in a document. But word order is important for the meaning of a sentence: It contributes to our understanding of the syntactic and semantic relationships between words. Here’s what a keyword-based search could look like:

Semantic Search: A Paradigm Shift

In semantic (or "neural") search, rather than comparing a query (e.g. a string or an image) directly to the database, we run it through a neural network that has been pre-trained on millions of data points. The neural model has learned to encode a query as a high-dimensional vector. This high-dimensionality allows neural models to better capture a query’s meaning, or semantic value. Neural search can be used on a variety of file types, including images:

Haystack’s semantic search capabilities focus on natural language. The latest neural language models are all based on the Transformer architecture. This refers to a type of neural network that was first introduced in 2017 in the now legendary Google paper “Attention is All You Need.” There are two things that you should know about Transformers: first, they operate on a technique called attention, allowing them to determine relationships between words; likewise, they are far more suitable for parallel computing than their predecessors, meaning that they’re much faster to train.

There are now thousands of Transformer-based language models serving a variety of use cases and many different languages. The website of NLP company Hugging Face has become a go-to for finding and downloading pre-trained models. Let’s now take a closer look at some flavors of neural search.

Semantic Document Search

We’ve seen how Transformer-based models can match item pairs by computing their similarity. In semantic document search, we make use of that feature to find the most similar document to a query. Note that a “document” simply refers to a textual entity that we store in our database. Depending on our use case, we may opt to use paragraphs or even sentences, rather than essay-length documents.

In the example below, you can see how a semantic search model is able to grasp the meaning behind the query and return the most relevant documents, in ranked order (sources: nytimes.com, si.edu):

Semantic search is closely related to the task of information retrieval. The retrieval task is greatly aided by a new generation of vector-optimized databases. Projects like FAISS, Weaviate, Jina, and Milvus allow for fast comparisons between vectors, even when there are many. Let us now look at how semantic search powers modern question answering systems.

Extractive Question Answering

It’s all in the name: a question answering system accepts a question and then returns not a document, but an answer. The term “extractive question answering” alludes to the fact that the system highlights one or more passages (from one or more documents) as answers. In contrast, there’s also generative QA, which generates answers from scratch (albeit still based on a textual knowledge base).

The standard for English language extractive QA is SQuAD, a dataset consisting of documents that have been annotated with questions and the corresponding answer passages. SQuAD is used to “teach” a system how to answer questions. To that end, a transformer-based language model like BERT is fine-tuned on the SQuAD dataset.

Open-domain QA combines document search with extractive QA in a retriever-reader-pipeline. An open-domain QA system can perform SQuAD-style question answering on a large collection of documents. First, the retriever searches for the most relevant documents in response to a query, either through a sparse retrieval method like tf-idf or BM25, or by leveraging a dense, Transformer-based retrieval model.

In the next step, the reader gives the selected documents a closer look by passing them through a pre-trained QA language model. The model then returns the text passages that it deems most likely to answer the query. Here’s an example of a retriever-reader pipeline in action:

When an open-domain QA system becomes accurate enough, it can function as a general-purpose search tool. For example, Google now returns answers to simple factual questions, where it is very confident in both its understanding of the query and the correctness of its answer. To see this functionality in action, try out a fact-seeking query like “What is Toni Morrison’s full name?”

Next, we’ll look at a special case of question answering, where the matching process works a bit differently from what we’ve seen so far.

FAQ-style Question Answering

Frequently Asked Questions (FAQs) are collections of question-answer pairs meant to address the most common issues that users encounter. For example, if an organization has a well-curated FAQ section, this can reduce strain on a customer service team due to higher rates of customers solving problems on their own. Advances in neural search help make FAQ search more efficient.

By augmenting your FAQ dataset with a semantic document search, users can find the right answer to their query even if they’re not using the same wording in the original question-answer pair. In FAQ-style question answering, a Transformer-based sentence encoder compares a high-dimensional representation of your query to the encodings of the questions in the database. It determines the question with the highest similarity to your query and returns the answer to that question as your result. Check out our blog post on semantic FAQ search to see an FAQ search system in action.

The field of NLP is constantly evolving and coming up with new language models to solve increasingly complex tasks. In the next section, we’ll look at how to augment your search pipeline with a diverse set of pre-trained models.

Next Generation Neural Search

As you’ll see when browsing the Hugging Face model hub, Transformer-based models allow you to undertake a variety of NLP tasks. How about a model that can translate from the Romance languages into English? Or a summarization model for Bahasa Indonesia? You can find that and much more at Hugging Face’s website.

The most exciting neural search systems today integrate some of these models into their architecture. Composite question answering systems do not stop at the usual retriever-reader pipeline that extracts documents and answers from a document store. Rather, they combine many different components to build ever more powerful systems.

In Haystack, you can easily implement these systems by adding more nodes to your pipelines. For instance, you could add a classifier, a summarization module, a document ranker, and much more. Alternatively, you can choose from a selection of readymade pipelines to set up a working system in no time. One example of a ready-made pipeline is the TranslationWrapperPipeline. It is useful if you need to translate to and from the language in your database, as in this example:

Here are a few more examples of composite systems that we’ve built with Haystack:

Search and Find  —  with Haystack!

No matter the kind of search pipeline that you’re looking to implement: Chances are high that we have just the right building blocks for you. The Haystack framework takes a modular approach to question answering and search pipelines, that lets you build systems of any complexity.

And it doesn’t stop there: We’ve designed Haystack as an end-to-end NLP solution that assists you at every step of the implementation process, from setting up a document store to deployment. Likewise, you can use our data labeling tools to annotate datasets and retrain and scale your models.

Ready to give Haystack a try? Head over to our GitHub repository, and if you like what you see, give us a star :) To see what our community is up to and speak directly to our team members, join our Discord channel.