Using Haystack with OpenSearch
OpenSearch engine is great open source product for storing unstructured data to search with Haystack.
In July 2021, Amazon released OpenSearch, an Elasticsearch-based project that replaces OpenDistro, Amazon’s former project for Elasticsearch. Haystack supports OpenSearch, among many other data storages, as your underlying document store. In fact, Haystack’s semantic search and question answering capabilities integrate smoothly with OpenSearch’s indexing and vector operations.
What Is Haystack?
Haystack is an open source NLP framework that is designed to create smart search solutions for large collections of textual data. It allows you to build semantic search pipelines on top of any document store, with optional nodes for question answering, summarization, translation, document ranking, and other NLP applications. Whatever your use case, the first step in a semantic search pipeline always requires retrieving documents from a document store. This is where Haystack and OpenSearch meet.
OpenSearch at a Glance
OpenSearch is Amazon's distribution of the popular data storage and analytics engine Elasticsearch. OpenSearch’s open source Apache 2.0 license means that all its features — including security features and SQL support are offered free of charge. Like Elasticsearch (ES) itself, OpenSearch follows a NoSQL design and stores documents in a distributed manner. It comes with some natural language search options that make the storage engine very popular for NLP applications.
Leverage Haystack Semantic Search to Query Your OpenSearch Document Database
In semantic search compatibility between documents is evaluated on the basis of semantics — that is, meaning rather than lexical overlap. While a lexical, keyword-based approach will simply look at the terms in a query and compare them to the documents in the database, semantic search uses Transformer-based language models to compute a vector representation of both the query and documents. It then returns the most compatible documents using vector similarity metrics like cosine similarity. The semantic method therefore complements OpenSearch keyword searches.
Thanks to the accuracy of results returned by vector-based semantic search, more and more databases, such as FAISS and Milvus, now offer tools that facilitate working with vectors from the start. Similarly, OpenSearch comes with a k-Nearest Neighbor (k-NN) plugin that is specifically aimed at use cases involving the identification of similar document vectors (k-NN describes a family of algorithms that find the k number of data points that are most similar to a given data point).
Learn How to Use Haystack with OpenSearch
Learn how to get started with OpenSearch using Docker on its official website.
Once you have installed Docker, follow the instructions to set up your OpenSearch document store in Haystack.
If you’re unsure which document store is the best for your application, check out our quick guide to choosing the right document store.