Apply Haystack to Question Answering

Question answering has been the most common NLP task leveraged by large enterprises for their internal IT systems.

We ask questions to better understand the world around us. Whether we’re trying to learn more about a person or a topic, posing a question allows us to get the information that we need to form a clearer picture. In the context of information retrieval from a collection of documents, you can similarly use questions to better understand your textual data, or to extract specific information from that data. 

Automated systems for question answering (QA) have undergone a dramatic overhaul. Thanks to Transformer-based language models and semantic search, these systems are now able to accept a query in natural language and extract an adequate and informative answer from a large collection of documents—all within seconds.

Assisting customer service, implementing an internal QA system for your employees, and using QA to create new metadata are just some applications of question answering systems. Textual data accumulates by the second, and it’s increasingly important to be able to make sense of that data through a natural language interface.

There are many flavors of question answering. One important distinction is between extractive and generative QA. With extractive QA, the answer returned by the system is a passage from your corpus. Generative QA, on the other hand, uses natural language generation (NLG) to form new answers from scratch.

Haystack implements question answering systems by way of pipelines that are both highly usable and customizable. You may choose between designing your own pipelines or using templates. The popular ExtractiveQAPipeline template, for instance, joins a Retriever (for extracting the relevant documents) and a Reader node (for identifying answer passages in response to a query). You need only define the types of retriever and reader you want to use for the task.

Haystack’s flexibility allows you to build the question answering pipeline that’s just right for your use case. We designed Haystack as a toolbox, with nodes for classification, translation, summarization, and many other applications. Our framework’s mix-and-match approach lets you plug in any node you want into your customized pipelines. Tell us what it is you’re looking for, and we’ll help you build the pipeline that can answer all your questions.

Try Haystack now!

Frequently asked questions

  • A base extractive QA system combines a Retriever and a Reader. The retriever matches the incoming query to the entire database and extracts the documents with the highest matching scores. This can be done using either sparse, keyword-based methods or dense, Transformer-based retrieval techniques. The reader module, on the other hand, always leverages a Transformer-based language model like BERT or RoBERTa. To be able to perform question answering, that model needs to be fine-tuned on a question answering dataset like SQuAD 2.0. The reader sees only the documents selected by the retriever. It “reads” those documents closely and returns the answer passages that it deems most suitable to the query.

  • Of course! On Hugging Face model hub (the community’s go-to place for finding Transformer-based language models), you can find question answering models for many different languages, including Chinese, French, Hindi and Turkish. At deepset, we have released the GermanQUAD dataset for question answering in German (plus the trained QA model). You can even train your own models. Keep in mind that the more data, the better the model will perform. That’s why both unannotated data (for training the Transformer-based language models) and annotated data (for fine-tuning to a specific task) are so important in NLP.

  • Since question answering pipelines are composite, complex systems, evaluating them is no easy task. You may want to evaluate the entire pipeline or evaluate the different modules separately. Both cases require different metrics. For the reader, there are exact match (EM), F1 (a harmonic mean of precision and recall), and deepset’s Semantic Answer Similarity (SAS), which uses a Transformer-based approach. For more detail, check out our blog post on evaluating question answering systems.