Generate Questions Automatically for Faster Annotation Workflows

Let Transformer models do the annotation work for you with Haystack’s new Question Generator


Without labeled datasets there would be no supervised machine learning: They provide the ground-truth needed to build predictive models. Trained annotators spend many hours on dataset creation — a painstaking and monotonous task. But while machine learning has managed to automate many tasks that previously depended on human expertise, data labeling itself has been a holdout of the automation wave. That is, until recently.

In our previous article, we presented our annotation tool, which greatly accelerates the manual annotation of question answering (QA) data. But what if we told you that you could take your annotation process a step further with the help of automation?

Read on to find out about the Question Generator and how you can use it to facilitate your annotation process, including best practices for a “human-in-the-loop” workflow.

What Is Question Generation?

Question generation is the process of automatically creating questions from a text. It is typically concerned with generating fact-seeking questions, such as those involving people, locations, or dates.

A successful question generation model has to account for both syntax and semantics. It’s not enough for the model to generate grammatically correct questions — it also needs to grasp which parts of a text are central and make for interesting questions. Question generation is thus closely related to natural language understanding and text summarization.

When Is Question Generation Useful?

Let’s look at three use cases where a question generator would come in handy.

1. Learning Material

A classic use case for question generation is the automated creation of learning material. In this scenario, the question generator comes up with questions that are used to check whether the learner comprehended a given text.

2. Data Exploration

A question generation use case with which you’re probably familiar is automatic question suggestion. The Google search engine uses question suggestions to help users navigate and explore data. For certain queries, Google returns a “People also ask” drop-down box with automatically generated, related questions. By clicking on a question, the box expands and highlights the corresponding answer span within a text. Turns out this is a great way to find out what other topics your document covers.

3. Automated Annotation of Question-Answer Datasets

Another key use case — and the one we will focus on in this article — is question generation in the context of annotation. Annotating datasets is an arduous and expensive process that often requires human annotators to manually label every data point. At the same time, representative and high-quality datasets are central to machine learning: not only do they allow you to build models that can generalize, but they’re also needed to evaluate your system’s performance.

Using the Question Generator to extract questions from documents can save annotators a lot of time. With the questions automatically generated, annotators only need to check on question quality. These professionals keep relevant and well-formed questions, but discard or correct the irrelevant or ill-formed ones. In this workflow, the annotators then manually mark answer passages to the generated questions. However, by combining the Question Generator with the Reader, you can automate the answer span labelling as well — as our example below will demonstrate.

How Does Haystack’s Question Generator Work?

The Question Generator uses a Transformer-based language model trained on a large number of question-answer pairs. The default model is a version of the T5 model, which excels at creating questions from general-domain texts. If your documents come from a specialized domain — say, law or medicine — or are in a language other than English, you can plug in a different model. The Hugging Face model hub is a good place for finding models for a variety of use cases.

Using the Question Generator is straightforward. We first import the class and initialize it:

from haystack.nodes import QuestionGenerator

question_generator = QuestionGenerator(model_name_or_path=”valhalla/t5-base-e2e-qg”)

We then create a text snippet from which to generate questions:

text = “””Macaroni and cheese is a dish of macaroni that is covered in a cheese sauce. It can be bought packaged. It was introduced in America by Thomas Jefferson’s enslaved chef, James Hemings, in 1803.”””

Finally, we run the question generator on our text:


>>> [‘ What is a dish of macaroni that is covered in a cheese sauce?’,
 ‘ When was macaroni and cheese introduced in America?’,
 ‘ Who introduced macaroni and cheese in America in 1803?’,
 ‘ What is the name of the dish that is topped with a sauce of cheese?’]

In just a few lines of code, we managed to create four semantically and syntactically well-formed questions from scratch!

Note that the Question Generator is different from the AnswerGenerator. Although both components generate text, the Generator is used to generate answers rather than questions.

How to Generate Questions in Haystack

When it comes to integrating the Question Generator into your annotation workflow, you might want to use one of our ready-made question generation pipelines. Haystack has two main pipeline classes that will get your question generation systems up and running in no time:

Question Generation

QuestionGenerationPipeline is a wrapper for the Question Generator. This pipeline has the same functionality as the Question Generator: Documents in, questions out.

Question and Answer Generation

QuestionAnswerGenerationPipeline combines the Question Generator with the Reader. In this pipeline, the Question Generator first creates questions in the manner we saw above. The Reader then returns the answers, as well as the contexts from which the answers were extracted. You can use this pipeline to create entire synthetic question answering datasets in a fully automated process. We’ll show you how in the example below.

Practical Example: Annotate Question Answering Datasets Automatically

You can use question generation to augment an existing QA dataset with new data, or even create an entire dataset from scratch. Let’s see how with the QuestionAnswerGenerationPipeline. We’ll be working with a small collection of food-related texts from Wikipedia:

texts = [
    {“text”:”Macaroni and cheese is a dish of macaroni that is covered in a cheese sauce. It can be bought packaged. It’s an American dish, introduced to America by Thomas Jefferson’s enslaved chef, James Hemings, in 1803.”},

    {“text”:”Falafel is a kind of vegetarian food. It is a deep-fried ball or patty made from ground chickpeas, fava beans, or both. The dish originally came from Egypt.”},

    {“text”:”Ravioli is a type of Italian food pasta dish. It is usually two layers of pasta dough with a filling between the two layers. There are many different recipes, with different kinds of fillings. The most common fillings are meat, vegetables or blackboard cheese.”}

Let’s start by importing and initializing the individual components:

from haystack.nodes import QuestionGenerator, FARMReader

question_generator = QuestionGenerator()
reader = FARMReader(“deepset/roberta-base-squad2”)

Next, we’ll place the two nodes into the pipeline:

from haystack.pipeline import QuestionAnswerGenerationPipeline

qag_pipeline = QuestionAnswerGenerationPipeline(question_generator, reader)

Now we’ll run the pipeline on our small text corpus and store the results in a list:

results = []

for doc in document_store:

Finally, we can print the automatically generated question-answer pairs:

for doc in results:
    for qa_pair in doc[“results”]:
        print(“Question: {}\nAnswer: {}\n”.format(qa_pair[“query”], qa_pair[“answers”][0][“answer”]))

This outputs the following question-answer pairs:

>>> Question: What is a dish of macaroni covered in a cheese sauce?
Answer: Macaroni and cheese

Question: What is the name of the common American food introduced to America by Thomas Jefferson’s enslaved chef, James Hemings?
Answer: Macaroni and cheese

Question: In what year was Macaroni and cheese introduced?
Answer: 1803

Question: What type of food is falafel?
Answer: vegetarian

Question: What is a deep-fried ball or patty made from ground chickpeas, fava beans or both?
Answer: Falafel

Question: Where did Falafel originate?
Answer: Egypt

Question: What is a type of Italian food pasta dish?
Answer: Ravioli

Question: What is usually two layers of pasta dough with a filling between the two layers?
Answer: Ravioli

Question: There are many different recipes with different kinds of what?
Answer: fillings

Question: What are the most common fillings?
Answer: meat, vegetables or blackboard cheese

And there it is: our entirely machine-generated question-answer dataset! As you can see, not all of the question-answer pairs are well-formed. The main problem seems to lie with specificity: For example, the question of “What are the most common fillings?” can hardly be answered without specifying the type of food it is about. Similarly, the question of “There are many different recipes with different kinds of what?” simply does not provide enough context for an adequate answer.

As these examples illustrate, synthetic question answering dataset generation works best when the process is human supervised. A supervised setup will still save annotators a lot of time, without compromising the quality of the generated data. It can even help improve your existing language models, as we’ll explain below.

Human in The Loop: Making the Most Out of Automated Question Answering Annotation

To avoid adding faulty question-answer pairs to your datasets, it’s good practice to incorporate a human supervisor in the automated annotation loop. What’s more, a human-in-the-loop scenario can even help you create more robust models. We describe the process below.

The human annotator reviews automatically generated question-answer pairs and makes corrections as necessary. When it comes to questions, the annotator can filter out faulty ones, like the ones we’ve seen above. Additionally, the specialist can manually correct wrong answers. This aspect of the process helps identify cases where the language model has trouble finding the right answers. You can then retrain your language model with the manually amended dataset, to make sure the model doesn’t repeat the same mistakes.

What’s more, a human annotator can introduce greater lexical and syntactic variety into the automatically annotated datasets. The Question Generator often sticks to the original text verbatim. If an input text is about macaroni and cheese, the odds are that the generated question will also mention “macaroni and cheese,” whereas a human annotator might say “mac and cheese” or “macaroni casserole.”

Adding human supervision to the annotation loop thus goes a long way towards building more diverse datasets, as well as improving the performance of your models.

Explore Question Generation and More in Haystack

Excited to try out the new Question Generation module that is now part of our Haystack NLP framework? Head over to our GitHub repository to learn more about question answering, semantic search, summarization, and other NLP components in Haystack. If it works for you, we would appreciate a star!