Save Time and Resources with the Query Classifier for Neural Search

Query classifier in Haystack helps to distinguish between keyword-based and natural language queries and route accordingly.

Queries come in all shapes and forms. A keyword-based search differs from a question posed in natural language, which itself only vaguely resembles an SQL query. In Haystack, we can account for these differences by integrating a special node into our QA pipeline: the query classifier.

A query classifier puts each incoming query into one of two predefined classes, and routes it to the appropriate section of the search pipeline. It can help you save time and money — by not unnecessarily activating hardware — and at the same time produce more relevant results. Read on to learn all about query classifiers, and how they can make your QA systems more efficient.

When to use a Query Classifier

Haystack question answering systems are typically wrapped in pipeline objects that are easy to build and run. Pipelines appear as graphs with nodes and edges to direct the flow of information. A pipeline graph can be straightforward, with one route between the input and output nodes. For more complex use cases, we can add different courses of action, represented by two alternative paths in the graph.

To determine which route our input should take, we add a decision node to the pipeline graph. This node decides, after reviewing its input, which route of the graph to send the query to. The image shows how a query classifier works:

For this kind of pipeline, a popular use case is a system that accepts queries both in natural language and as a sequence of one or more keywords. If you’re asking a question in natural language, it’s important that the system can parse it properly. It should know the difference between subject and object, and it should have a grasp of synonyms. The system achieves this thanks to a dense retrieval method trained on a deep language model, like DPR.

With a keyword query, on the other hand, there’s no syntactic structure to decipher. For this use case, we recommend a sparse retrieval method (e.g., ElasticsearchRetriever) that strictly matches the query to the words in the documents. For your system to decide on the optimal retrieval method, it’ll need to know which of the two kinds of queries it’s receiving.

There are many other use cases for a query classifier. You might want to skip the QA model for keyword queries and just return documents, adapt the classifier to distinguish questions from statements, or determine a question’s topic and use that knowledge to further filter your database. Note, however, that the default implementation of Haystack’s query classifiers is equipped to handle only binary problems.

Types of Query Classifiers

Classifiers vary significantly in complexity. For instance, you could define a question mark in your query to signal that you have a question in natural language rather than a keyword query. But you’re probably better off working with a machine learning model that’s been trained for the task. Haystack offers two classes of classifiers out of the box that can be initialized with different trained models:

SklearnQueryClassifier

scikit-learn (sklearn for short) is the machine learning engineer’s multi-purpose toolbox. It implements practically every popular classification model (see full list). By default, Haystack’s SklearnQueryClassifier class loads a Gradient Boosting Classifier that’s been pre-trained to distinguish between keyword and semantic queries.

Alternatively, you can train your own model on a binary classification task and pass it to the classifier during initialization. Don’t forget to also include the sklearn vectorizer used for turning the raw texts into bag-of-word vectors during training.

TransformersQueryClassifier

Transformers are behind the impressive results of our neural search modules and virtually every other modern NLP application. Our TransformersQueryClassifier class can use a variety of transformer models — as long as they’ve been trained on a binary classification task.

As with the SklearnQueryClassifier, you can also pass your own pre-trained model, or select one from the Hugging Face Model Hub. By default, the TransformersQueryClassifier loads a mini BERT model that’s fine-tuned on the keyword-vs-questions task. You can find it on Hugging Face under the name “shahrukhx01/bert-mini-finetune-question-detection.”

Example: Keyword vs. Natural Language Queries

To set up our custom pipeline, we first preprocess the documents. In our example, we’ll be working with a Harry Potter dataset that we’ve split into sequences of 250 words (a compromise between the recommendations for dense and sparse retrieval methods). For more detail on this step, check out our post on parameter-tweaking.

We’re now ready to set up our document store and feed the documents to the store:

from haystack.document_store import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore()
document_store.delete_documents()
document_store.write_documents(docs)

Setting up the query classifier is straightforward:

from haystack.pipeline import SklearnQueryClassifier

query_classifier = SklearnQueryClassifier()

The classifier is initialized with the default pre-trained Gradient Boosting model that’s been trained to distinguish between keyword and natural language queries.

Next, we initialize the two retrievers (one dense and one sparse) and index the documents in the database using the dense retriever:

from haystack.retriever import ElasticsearchRetriever
from haystack.retriever.dense import DensePassageRetriever

es_retriever = ElasticsearchRetriever(document_store)
dpr_retriever = DensePassageRetriever(document_store=document_store)
document_store.update_embeddings(dpr_retriever)

Finally, we set up the reader with a trained RoBERTa SQuAD model:

from haystack.reader import FARMReader

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2",
use_gpu=True, return_no_answer=True)

All that’s left is to combine all of our components in one pipeline:

from haystack import Pipeline

pipeline = Pipeline()
pipeline.add_node(component=query_classifier, name="QueryClassifier", inputs=["Query"])
pipeline.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_1"])
pipeline.add_node(component=es_retriever, name="ESRetriever", inputs=["QueryClassifier.output_2"])
pipeline.add_node(component=reader, name="QAReader", inputs=["ESRetriever", "DPRRetriever"])

Alright, our pipeline is ready! Let’s have a quick look at the pipeline’s graph to see if we’ve missed anything. To do so, we would simply call the draw() method on our pipeline object. It will store a .png file of the graph in the same directory. (To use this method, you’ll need to have pygraphviz installed.)

pipeline.draw() produces this image:

Just as we wanted, our pipeline branches out as the query classifier routes the queries to the two different retrievers, whose results then are joined in the Reader node. Let’s now finally test out the pipeline:

result = pipeline.run(query="What is the name of Ron's pet?", top_k_retriever=20, top_k_reader=3)

The pipeline returns a dictionary object, which in turn contains a list of dictionaries with our answers and other information. We’ll use a list comprehension to extract only the answer passages and their probability values:

[(dicty['answer'], round(dicty['probability'], 2)) for dicty in result['answers']]

>>> [('Scabbers', 0.8), ('Pigwidgeon', 0.77), ('Pigwidgeon', 0.78)]

Let’s ask the same question using the keywords “Ron,” “pet,” and “name”:

result = pipeline.run(query="Ron pet name", top_k_retriever=20, top_k_reader=3)
[(dicty['answer'], round(dicty['probability'], 2)) for dicty in result['answers']]

>>> [('Pigwidgeon', 0.8), ('Scabbers', 0.88), ('Scabbers', 0.6)]

The answers are definitely different, but so were the queries. How can we be sure that the routing worked and the two questions were indeed handled by different retrievers? Let’s look at the QueryClassifier node in isolation to see what’s going on behind the scenes. We’ll need to extract the node from the pipeline using the get_node() method:

classifier_node = pipe.get_node('QueryClassifier')
classifier_node.run(query="What's the name of Ron's pet?")

>>> ({'query': "What's the name of Ron's pet?"}, 'output_1')

The node returns an ‘output_1’ label for our input, which is what we told our dense retriever to expect. How about a keyword query?

classifier_node.run(query="Ron pet name")

>>> ({'query': 'Ron pet name'}, 'output_2')

Our query receives an ‘output_2’ label, which means that it’ll be passed on to the sparse retriever. Is it just the question mark that helps the classifier distinguish between the two query types? Let’s check:

classifier_node.run(query="What's the name of Ron's pet")

>>> ({'query': "What's the name of Ron's pet"}, 'output_1')

Even without the question mark, the classifier still recognizes the query as a question. Let’s make it even harder for the classifier and add two interrogative markers to our keyword query: a question word and a question mark.

classifier_node.run(query="What name pet Ron?")

>>> ({'query': 'What name pet Ron?'}, 'output_2')

Nope, no chance. If there’s no sentence structure, our query classifier won’t classify a query as a natural language question.

Saving Time and Resources by Rerouting Your Keyword Queries

The reader with its QA model is the most expensive component of a question-answering pipeline — both computationally and time-wise. However, in the case of keyword queries, you might want to perform a regular document search instead of employing the full semantic search pipeline. In that case, you can easily change the architecture of your pipeline so that only the natural language queries are routed to the reader. To achieve such routing, set up your reader node as follows:

pipeline.add_node(component=reader, name="QAReader", inputs=["DPRRetriever"])

Let’s use our draw() method again to see how the graph has changed:

Our internal tests have shown that by skipping the reader model, a keyword query is processed about 80 times as fast as a natural language query.

Example: Questions vs. Statements

The TransformersQueryClassifier can handle even more complex models. Let’s briefly look at how a BERT model can distinguish between questions and statements. We won’t be building the entire pipeline. We’ll just set up the classifier object and run it with a couple of queries:

from haystack.pipeline import TransformersQueryClassifier

statement_classifier = TransformersQueryClassifier(model_name_or_path='shahrukhx01/question-vs-statement-classifier')

Let’s start with two simple examples as a sort of sanity check:

statement_classifier.run(query="Maclunkey is my name")

>>> ({'query': 'Maclunkey is my name'}, 'output_2')

Question or statement? This one’s pretty obvious, and we now know that ‘output_2’ corresponds to a statement. Does this mean that ‘output_1’ stands for a question?

statement_classifier.run(query="Who shot first?")

>>> ({'query': 'Who shot first?'}, 'output_1')

Looks like it. Now what happens if we feed the classifier a statement that looks like a question?

statement_classifier.run(query="It doesn't really matter, does it?")

>>> ({'query': "It doesn't really matter, does it?"}, 'output_2')

The classifier correctly classifies it as a statement rather than a question. Semantically, a question requires an answer to fill an information gap, so the assessment here is correct.

Finally, let’s look at the reverse case, where a question is cloaked in a statement-like sentence:

statement_classifier.run(query="Who was it that shot first, I wonder.")

>>> ({'query': 'Who was it that shot first, I wonder.'}, 'output_1')

Our classifier cannot be fooled. It correctly recognizes that the last query was actually looking for information.

Let’s see where this classifier might be useful. Imagine you’re building a chatbot that’s able to converse and answer questions. You don’t want your system looking for answers to non-questions. Otherwise, the question-answering pipeline would be triggered in vain — and even worse, it would give an answer where none was required, making it less convincing of a conversational system.

Another common use-case would be enterprise search built on top of Elasticsearch. If you’d like to review the core components of a Haystack enterprise search system, have a look at this article which covers them extensively.

Including a query classifier lets your system make smarter decisions before triggering the entire question-answering pipeline. This not only maximizes your system’s accuracy but minimizes total computation effort and cost.

Boost Your QA System’s Efficiency with Query Classifiers

Looking to get better results from your question answering pipelines? As we’ve seen, query classifiers leverage the power of machine learning models to achieve that goal.

Check out our GitHub repository to get started with Haystack today. We’d appreciate a star too! :)