Let the Experts Speak: Advanced Neural Search with Sooth.ai
As part of our case study series, we have handpicked Sooth.ai as the first one to showcase Haystack capabilities.
At deepset we worked hard to create Haystack, a neural search framework that anyone can use to efficiently build end-to-end question answering systems. Our reward? A vibrant community that never ceases to amaze us with its creativity and resourcefulness. Our users have come up with hundreds of different applications, from private, small-scale projects to production-ready systems built for millions of users.
As part of our case study series, we have handpicked select projects to showcase Haystack capabilities. Perhaps you might even get inspired to start your own project, or find a solution for a problem you’ve been struggling with. In this first installment, we’ll talk about the case of Sooth.ai, a neural search platform that relies on a curated set of resources.
What Is Sooth.ai?
Jeff Walsh, the founder of Sooth.ai, came up with the idea for a reliable, fact-based Q&A platform while working in the financial sector. Often, he would need quick, research-backed information to help him decide whether to, say, invest in a new technology. Google and similar search engines surely satisfied the need for speed. But Jeff knew he couldn’t simply rely on the top results to provide him the information he needed to make important decisions.
“You were forced to manually check for quality,” says Jeff. Doubts like “How old are the results?” and “How reliable are the sources?” would recur with each search. As any good researcher would say, fact-checking is a job in its own right. “I grew tired of looking for great, recent, and reliable information,” recalls Jeff. He knew that all the information was out there — only someone had to aggregate it, and make it searchable.
The idea to build a database of research-backed documents thus came naturally. Sooth.ai now hosts about 25,000 English language research reports from 63 reputable sources, including global think tanks, government agencies, and university research centers. These reports cover a range of highly researched topics, such as the economy, foreign policy, technological issues, and the environment.
Populating the Sooth.ai database with reliable information was the first step towards realizing Jeff’s vision. The second was to make these documents searchable through natural language. Luckily, he found just the framework for the job.
Leveraging The Haystack Framework
Once Jeff had decided to make his vision a reality, he joined forces with Gabriel Ronai, an experienced backend engineer. How could the duo make tens of thousands of documents searchable through a natural language interface? Jeff had discovered Haystack through an episode of Rob de Feo’s Startup Engineering podcast and was eager to try it out. The rest is history.
“Quickly, we knew that Haystack was the way to go,” says Gabriel. “We liked the structure and flexibility provided by pipelines.” Gabriel himself did not have any experience with natural language processing (NLP). “I had previously used AI in image processing, but never NLP,” he says. “But I found Haystack super easy to navigate.”
The team set up their first prototype model, using the default extractive question answering pipeline with the RoBERTa reader model. Even with this basic setup, “the results were surprisingly good,” says Jeff. “It’s pretty impressive what the out-of-the-box performance looked like.” But there was still room for improvement. Jeff observed that “the default language model performed really well on certain types of informative content, such as clean energy, technology, climate change.” However, when the topics veered into more contentious territory, the team saw a dip in performance. “In a topic like foreign affairs, the content got more challenging and we noticed that the models did not perform as well.”
Fine-tuning a Language Model for Better Performance
During the initial project phase, Jeff assembled a small team. Together, they annotated a dataset of a hundred question-answer pairs, which allowed them to compare the performance of different language models. Once a prototype question answering pipeline was in place, the team went back to prepare larger datasets to be used for fine-tuning.
In deep learning, fine-tuning is an efficient method for leveraging the power of pre-trained models and adjusting them to your use case. It’s a technique that can be used in different domains, be it image recognition or NLP. What makes fine-tuning so useful is that you only need a fraction of the millions of data points required to train a model from scratch.
The Sooth.ai team sought to “add some scientific rigor” to the labeling process, according to Jeff. They split into working groups, read Haystack’s labeling instructions, and used the Haystack annotation tool to assist with the workflow. The attention to detail paid off: after fine-tuning the base model on 700 custom annotated data points, the team saw “a massive jump in performance,” says Jeff.
User-Oriented Documentation for the Win
The Sooth.ai team has had a positive experience with the Haystack framework and the support they have received from deepset. According to Jeff and Gabriel, someone from the deepset team has always been accessible to answer any questions, provide workflow recommendations, or point them to a specialist who might be able to help on a more technical question. When the Sooth.ai team filed a GitHub issue, it was addressed within a week. But what stood out most to the Sooth.ai team was the documentation.
“The overall documentation is great,” says Jakub Patrik, who has been with Sooth.ai throughout its Haystack implementation. “The comments are informative and comprehensive, so you can easily figure out which part of the code is doing what.” Jeff agrees: “From a non-technical perspective, the documentation is really, really good.” Gabriel also emphasizes the value of the Haystack tutorials: “The tutorials — and the one on fine-tuning stands out — were of great help to us because they also included best practices.”
The Sooth.ai team will continue to explore the Haystack framework as a source of new functionality and added value for users. They’re currently in the process of evaluating different models on their newest evaluation dataset, which includes 2,000 labels. Additionally, the team is looking into adding more nodes to the retriever-reader pipeline, including a metadata filter that could speed up the retrieval process.
At present, the Sooth.ai platform, which is still in beta, is seeking users who rely on trusted research, be it university students, journalists, or policymakers. Ultimately though, Sooth.ai wants to target anybody who’s looking for reliable and quick answers to fact-seeking questions. For more information, visit https://sooth.ai and try it out for yourself.
Get Started with Haystack
Looking to realize your own vision using the most recent advances in natural language processing? Make sure to check out the Haystack GitHub repository. And if you want to connect with the great folks in our community or show off your own project, join us on Discord.