Haystack v1.0 Official Release

Start creating NLP services for question answering, semantic document search and much more!

08.12.21

Since the inception of deepset, we have held a strong belief that the developments coming out of the world of research would bring about significant improvements to the way that text is processed and searched through in enterprise settings. To pursue this, we started writing code to make the latest transformer models easier to use, more scalable and more suited to production environments.

Over time, the code, the tools and the resources that we created have been incrementally added to Haystack, our open source Python framework. And as our ambitions have grown, so has the response from our community who find new and exciting ways to use and improve Haystack. Looking over what we created, we now see that it is starting to resemble the vision that we had in our minds when we first embarked on this journey. And so it is with our greatest pleasure today that we are announcing the release of Haystack V1.0!

Indeed there is a raft of very significant new features in this release including support for table question answering, reworked evaluation, and simpler debugging that we recommend you to read about in our blog article here. But to really understand the significance of this release, we want to talk a bit about the state of search technologies today, the needs of our users and how Haystack fits into enterprise technology ecosystem.

The State of Search

Search is ubiquitous. It’s expected to be packaged with just about any program that interacts with text. With breakthroughs in machine learning and natural language processing (NLP), search technology has taken a big leap forward and Haystack dares to take the leap with it. The latest search systems that you can build in Haystack can now handle full sentence questions with sensitivity to their syntax and semantics. They no longer rely solely on keyword matching algorithms like TF-IDF and BM25 as the last generation of systems did. Thanks to these developments, search now comes in many different flavors and has become one of the most popular applications of NLP.

Also, research from elite institutions has pushed previously aspirational tasks, like question answering and summarization, firmly into the realm of both the possible and the practical. Transformer models have been a revelation in NLP. Performance on many tasks increased greatly in a short period of time and the transfer learning paradigm has allowed for new models to be trained with only a modest amount of data. Transformers now are the core technology powering the large majority of the best systems out there.

With these developments, we have seen growing interest in developers to try out the latest methods and build their own NLP services. Haystack has been a toolkit providing everything they need to create scalable custom NLP systems that perform semantic search, document ranking, question answering, and summarization, just to name a few. In fact some of these creations are already in use at companies such as Etalab, Airbus and Alcatel Lucent.

But to understand why Haystack has been so valuable to these teams, it’s important to zoom out and see the full scope of the repository.

From Storage to Models to Deployment

We can’t stress enough that Haystack is not just about models. It’s an end-to-end framework. Our aim is to provide tooling and support for any step in the process of deploying an NLP service, such as semantic search system or question answering engine, via an API. For example, we understand the impact that choosing the right storage option can make to the success of your project. As such, we provide support for popular databases like Elasticsearch, SQL and OpenSearch, as well as vector optimized options like Milvus, FAISS and Weaviate in our DocumentStores classes.

Also, a core design feature of Haystack is the pipeline. This feature enables different components to be chained together and handles the routing of data dynamically. For example, you can create a standard question answering system with a document store, a Retriever and a Reader using a Pipeline. However, you might also want to have different retrievers to handle different types of queries or create replicas of the reader in a distributed setting to speed up inference or even slot in a custom node that you defined into an existing system. All of this is possible through the flexibility and power that the pipeline offers. We know that it is crucial for many that pipelines are easily defined, and so we have provided the ability to configure, save and load full pipelines from a single YAML file.

Of course, many users need to be able to fine-tune models to their specific, often jargon filled domains. Haystack’s domain adaptation and evaluation features are designed to ease this process. We also know from personal experience that data labeling and annotation are crucial and often very laborious steps in a data driven project. The Annotation Tool is there not only to provide a graphical interface for labeling, but also provides features that help teams organize their manpower.

Finally, a lot engineering effort is required to take a demo script and turn it into a production system. A truly useful NLP service needs to be reliably stopped and started to fit different hardware environments via Docker containers which is why we have included both Docker and Docker Compose files. It also needs to handle an incoming stream of requests and return results in a format that is readily consumable by other programs. For this we have provided a simple process for Haystack users to deploy their Haystack Nodes and Pipelines as a fully fledged service complete with REST API. These features are not afterthoughts to Haystack but rather core to the purpose of the framework and we believe that this is what has so far drawn many developers to our repository.

This is Just the Beginning

As more developers start building more search programs with features like question answering, semantic document search or summarization, we are going to see a paradigm shift take place. Users will start to expect specific answers to questions presented along with their supporting documents. We can’t expect those working in information seeking positions to wade through large swathes of text any more without clear and effective summaries. And search will no longer be synonymous with keyword search. Haystack is our contribution to this shift and with the support and feedback of our outstanding open source community, we will continue striving towards making this the best tool available to anyone looking to work with text in a smarter way.

If you’re ready to try out Haystack for yourself, please check out our Quick Demo or learn to install it with our Get Started page. If you still have any questions or would just like to chat with us, feel free to join our Discord channel. We’re excited for what is to come in the near future and we hope that you’ll join us on this journey!