What Semantic Search Can Do for You

Is natural language interface the future of how people will interact with data?

05.03.21

Andrey A.

Search permeates our lives. Maybe we’re trying again to find the apartment keys, or perhaps it’s about searching for an affordable flight on the web — search is and will always be present. Especially in our digital lives, it is difficult to deal with the ever-increasing volumes of information. What we’re looking for is often hidden deep inside all sorts of poorly organized text data. It becomes even more challenging within an enterprise environment where looking for an important piece of information feels more and more like searching for a needle in a haystack.

Should We Settle for a Keyword Search?

Search tools, often based on simple keywords, are integrated into almost all modern applications. These search tools have saved us much time and effort to find information in our often overwhelming enterprise databases, wikis, websites, or applications. However, in many cases, they still fail. Instead of providing a particular piece of knowledge or pointing in the right direction, these traditional tools fail to retrieve a manageable amount of relevant results.

Lately, modern technologies such as Natural Language Processing (NLP) have enabled a new paradigm of search and offer the power to massively improve our search experience. In particular, NLP allows us to shift, where applicable, from elaborate keyword-based matching towards the semantic understanding of our queries and questions. This way, we do not necessarily have to remember specific keywords or terminology to retrieve the desired information. Semantic search technologies such as Question Answering (QA) enable us to query text data by simply typing our questions into the search bar, then getting full answers rather than just a bunch of links to the documents matching the keywords.

However, when does semantic search enhance our search experience, and when would a keyword-based search be sufficient?

Navigational vs. Informational Search

Based on the learnings from our clients and the open source community, we’d argue it all depends on the intent of the search. Almost every search query is triggered by an intent. We do not search because searching is so much fun. We search because we aim to meet a specific objective. For example, we may need to find a relevant website, explore an interesting topic, come up with an important decision, purchase a much-needed item, or maybe we’re just learning a new language. These intentions make the search either informational or navigational by nature.

Navigational search queries are driven by the need to be directed to a certain point on the Internet or inside the enterprise database. We use navigational search when we have a general idea about what we need to find, but we do not know a direct path and use the “navigation” to suggest one place after another.

This means when conducting a navigational search within enterprise databases our intent is similar to checking “points of interest” in a car navigation system or perhaps entering an incomplete address step-by-step by following suggestions.

In some cases, we also use navigational search to browse for a collection of items. For example, when searching for “project management tools,” we aim to investigate various applications to identify and select the one that fits best. We have an idea about the “destination point,” and we’d like the search engine to suggest various options to pick from.

In many cases, keywords are still sufficient for navigational search.

Informational search is something different. It is usually about the intent of accessing the knowledge directly and right away — it’s not about browsing a collection of potentially relevant results from a keyword-based search. Informational search is also usually stemming from having a certain question in mind.

We often do not know where to look for that particular result. We may be lacking specific terminology to describe the desirable answer, especially when being a newcomer to the knowledge area. Because we don’t know in advance what “keywords” the answer will contain, we can’t quite rely on a keyword-based search in this situation.

Our experience of helping enterprise customers with the above suggests that a semantic search can overcome the limits of a keyword-based approach and drastically improve informational search results.

What Semantic Search Can Do for You

The creation of and interaction with the knowledge is a major part of an enterprise environment. Many efforts have been made to preserve the knowledge by accumulating reports, executive summaries, meeting protocols, and other unstructured data. But as organizations grow, the amount of such data grows beyond being manageable. Finding a piece of information becomes much less feasible unless we put search terms into context.

Think maybe of someone who wants to learn about the applications for Transformer-based NLP models and what they have been used for within the company. It is unlikely that putting together a search query containing just the keywords “transformer,” “applications,” and “built” will lead to meaningful information — the context has been lost here.

In turn, semantic search allows us to directly leverage that specific question that we wanted to “apply” against the text data, for example, by asking, “What to build with transformers?”

However, knowledge in the enterprise context also goes beyond just quickly solving yet another 80/20 rule and pinpointing those twenty percent of documents containing the most relevant answers. Managing the knowledge within the enterprise is also about finding people — for instance, the experts in a certain field.

The vast amount of documentation about previous projects often speaks of not just what has been done but also about who has done it. With a semantic search engine, it is unnecessary to read the reports one-by-one to identify people with specific expertise. The search will tell us exactly just who’s got the experience we are interested in.

To illustrate this, in the example below, we are looking for someone “Who has experience with Transformers?”

Technologies like NLP and QA enable search tools with a semantic understanding of text data and the queries. They, therefore, serve as a natural language interface to interact with enterprise knowledge. These semantic search engines will no longer offer just a simple “keyword matcher.”

Instead, they provide an interface to interact with information in all formats bringing us closer to much more generic and flexible virtual assistants. One particular driver for this kind of search will also be the pervasiveness of conversational AI — chatbots and voice assistants.

We believe all of the above will lead further away from using rigid keywords and will push the search systems to interpret the most powerful communication protocol — human language.

Enter Haystack

Over the last couple of years, we’ve developed Haystack — our open-source framework to build end-to-end question answering systems and semantic search pipelines.

Haystack offers the most practical way of implementing information-intensive search workflows by providing a natural language interface for unstructured data. It’s aimed at developers who just need the job done while bridging the gap between the most recent achievements in NLP and the real-world user requirements.

We want to thank everyone who’s interested in Haystack and has been supporting us through our journey. We’d encourage you to join our friendly community, and please help us spread the word by starring the repository and following us on Twitter.

Stay tuned for more insights around Haystack!