Improving On-site Search for Government Agencies: Etalab Case Study

How a team of French developers and researchers built NLP-powered search systems to ease and improve information retrieval on government websites

Government agencies house large collections of data—from medical records to crime statistics, to information about public transport use. Giving citizens access to these records improves data literacy and encourages participation in the political process. But the drive towards data accessibility does not stop at making data public. Administrations also invest in on-site search systems to allow public officers or users of a public service to find the data they’re looking for—simply by asking questions in natural language.

Inside the French Interministerial Digital Directorate (DINUM), the Lab IA of Etalab works towards making public data more accessible by implementing natural language processing (NLP) strategies for French government agencies. For this case study, we spoke with Lab IA team members to learn how they used Haystack to develop intelligent on-site search and NLP-powered question answering systems for government agencies—and how they managed to enrich the field of French NLP with their models along the way. 

What Is Lab IA?

Etalab is a French government task force specialized in data politics that is part of the State’s broader digitalization strategy led by the French Interministerial Digital Directorate (DINUM). One of Etalab’s teams is Lab IA (“The Laboratory for Artificial Intelligence,” in English), which is tasked with implementing AI and machine learning techniques within government agencies. Lab IA collaborates closely with INRIA (the National Institute for Research in Computer Science and Automation), which is well-known for its early contributions to Python machine learning library scikit-learn.

Guillaume Lancrenon is lead developer working with Etalab. When he joined the project, he immediately recognized the challenge of providing government agencies with a tool for on-site search that was both flexible and easy to use. “There aren’t a lot of software developers inside the different government ministries,” he says. “We wanted to create a website search software that the ministries could implement quickly and easily.” Currently, those ministries are still relying on proprietary search tools which often use a keyword-based approach and thus aren’t able to capture the semantics of a query. That's exactly what he and his team set out to transform.

Lab IA's Project to Develop Domain-specific On-Site Search Systems

To improve the keyword-based website search systems of government agencies, Guillaume conducted market research and quickly landed on Haystack. “I was looking for a semantic question answering tool that would make it easy to use our own AI models,” he explains. Project manager Robin Reynaud adds: “Because we were building information retrieval tools for many different government bodies, we needed to use our models in more than one context.” 

To build these various on-site search systems, the team had to understand the main differences between the ministries’ textual data. In fact, they quickly learned that each body had differently structured their documents. “The knowledge bases are organized very differently,” Robin explains. “Sometimes you have titles that are really meaningful, so you want to include them. Or the documents are very long, so you have to split them correctly. Some agencies have FAQs, which require a special pipeline setup. In our experience, these variables vary significantly between the ministries.” That’s where Haystack provided the greatest value for the team. “Haystack allowed us to easily build domain-specific question answering pipelines for many different contexts.”

At the time, there weren’t many datasets for question answering in French. So the team set about creating their own resources, with the goal of building robust yet easily customizable on-site search systems for the various government bodies.

Lab IA’s Resources for French NLP

Annotated datasets fuel modern machine-learning models. For a specialized task such as question answering, one can fine-tune a general pre-trained language model to domain-specific use cases. Fine-tuning requires considerably less data than training a model from scratch. Lab IA used CamemBERT as its base, a RoBERTa-like model for French. When the team started out, there was only one question-answering dataset available for French: SQuAD-FR, a machine translated version of the English SQuAD dataset. The team decided to establish their own native dataset modeled on SQuAD, which they called PIAF ("For the Francophone AI Community," in English). 

Unlike SQuAD, which was developed through the effort of crowdworkers on the Amazon Mechanical Turk platform, the PIAF dataset relies on volunteer annotators. The Lab IA team strove to build their annotation platform as user-friendly as possible and ran several “annotathons” to train contributors on the annotation task. Today, PIAF has 9200 annotated question-answer pairs. To learn more about French language NLP resources, have a look at this Etalab blog post (in English). 

The PIAF dataset can be used to adapt the CamemBERT language model to the question answering task. But not every question answering model is the same. To achieve an optimal search system, pipeline parameters had to be adapted to each project individually. The Lab IA team designed a framework for automatic parameter tuning called PiafML, which would help them find the optimal parameters for each individual pipeline. 

Modern semantic search systems, which the Lab IA team uses to implement their on-site search tools, include a retriever-reader pipeline. The Retriever selects a predefined number of documents from a large database, which the Reader then scrutinizes for answers to a given query. 

Although the Lab IA could leverage the Haystack Pipelines to match the needs of each of the various government bodies, the team was still struggling with the tuning of the pipelines. Indeed, every pipeline has parameters that can be adjusted individually, such as the number of documents to pass to the reader, document length, and whether to include titles. Once the ideal parameters have been tested on the knowledge base, the PiafML system creates a YAML file to define the individual configuration for each ministry. That file can be used to directly deploy the on-site search system on the ministry’s website.

The first project phase saw the rapid development of a working system. However, the Lab IA team identified some room for improvement. During evaluation, they noticed that some of the pipelines were underperforming: The retriever often acted as a bottleneck. “We noticed that the retriever wasn’t passing on the correct documents to the reader,” resulting in incorrect answers, Robin explains. “So we decided to focus on improving the retriever.” The team decided to collaborate closely with INRIA to create a more accurate retriever.

Improving Document Retrieval through Reranking

As Lab IA began working on assessing the pipelines' performance, Abdenour Chaoui joined the team for a six-month internship. “We needed a very good student to come work with us on this problem,” says Oana Balalau, a researcher at INRIA who supervised Abdenour. “This student was Abdenour.” Once it became clear that the retriever was causing the drop in performance, Abdenour set out to improve its results as part of his Master’s thesis.

Initially, the team thought that they could improve the pipeline’s performance by simply having the retriever pass more documents to the reader. However, Abdenour says that “the more documents we provided to the reader, the less accurate it became.” The team clearly needed an alternative solution: “Our goal became to provide less documents, but with higher accuracy,” Abdenour recounts. This is how he and his supervisors came up with the idea of a reranker.

“The reranker’s task is to push the relevant documents to the top of the stack, so that the documents that the reader sees have higher relevance,” explains Abdenour. Piaf-ranker, the final reranking model, is based on CamemBERT and has led to an absolute improvement of 12% recall on three different datasets. Thanks to the Piaf-ranker, the reader is able to select more accurate answers in response to queries posed by users. 

Future Outlook

While Lab IA’s isolated results have been encouraging, its on-site search systems have yet to be implemented by the various government bodies. The team members themselves are confident that semantic, domain-specific search systems are the way to go—but some decision-makers still need to be convinced. Guillaume is optimistic: “We need to conduct some tests to demonstrate how our systems improve user satisfaction. That’ll convince everyone of the superiority of on-site semantic search over keyword-based approaches.”

What is certain is that Lab AI has established a good relationship with our team here at deepset. “For the last seven or eight months we’ve had monthly calls with the deepset team,” says Guillaume. “They assisted us with the deployment of the final systems, and we have an ongoing exchange about how to improve our reranking model.” In a way, the two teams have grown in parallel. When the Lab IA team started working on the PiafML project, “the Haystack repository on GitHub only had a few hundred stars,” Guillaume remembers. “Now, it’s much bigger, and the documentation has evolved a lot.” 

Build Your Own On-Site Semantic Search System

If you have a knowledge base of documents and want to make it searchable, look no further than Haystack. With our NLP framework, you can go way beyond a vanilla keyword-based search system that is often unsatisfying for users. Instead, Haystack lets you incorporate the latest Transformer-based language models, which power semantically-informed document searches and question answering systems.