The Implementation Cycle in Applied Natural Language Processing (NLP)

How to build winning applications using natural language processing (NLP): a practical guide

Picture of the full moon behind a tree at night.

Software development is now widely understood as a cyclical process, with frequent iterations of planning, coding, testing, and deploying leading to the best results. This is because the way a system is designed to work can differ vastly from how it’s used in the real world. Therefore, developers can only get a realistic picture of their product’s ability to solve an actual problem by subjecting it to frequent tests. Once they have evidence of the system functionality, they can use this to tweak and tune it.

There’s no reason not to apply these same principles to projects that incorporate natural language processing (NLP). But because developers can find it hard to wrap their heads around the complexities of modern NLP as a whole, they might not follow best practices when implementing NLP projects. At deepset, we’ve seen many teams develop large-scale NLP systems in a linear fashion, only to realize, after months of development, that their application didn’t solve the right problem. Far too often, those projects fizzle out, wasting budget, and discouraging anyone involved from ever trying again.

The solution? Deploy often, test early, and prioritize involving real-world users from the very beginning. To help you think about the NLP development process from start to finish, we have developed this handy guide to the implementation cycle in applied NLP.

What is applied natural language processing?

Unlike theoretical NLP, applied NLP focuses on providing developers with the tools they need to leverage pre-trained language models to benefit their organization in practice. People just learning about theoretical NLP are sometimes intimidated by the complexity of large, Transformer-based language models. These complicated fundamentals are less of a barrier in applied NLP: you will rarely be training a language model entirely from scratch. 

Thanks to model sharing, you can go to centralized locations like the Hugging Face model hub, where tens of thousands of pre-trained models are freely available. You can also use an interface like OpenAI’s API to access their language models without even leaving your IDE. Due to this ease of sharing, everyone can benefit from the huge leaps that research in NLP has made since the inception of the Transformer architecture for language modeling. 

Implementing a natural language processing system

The NLP implementation process consists of two interlocking and overlapping phases: prototyping and machine learning operations (MLOps). In the prototyping phase, the developer sets up a working prototype pipeline and experiments with different configurations. In applied NLP, we usually opt for rapid prototyping: a workflow in which a prototype system gets developed and deployed quickly, to make sure that we collect user feedback very early on in the process. This way, we can iterate through prototypes quickly, constantly improving and refining our system.

Once the cycle of prototyping, deployment, and user feedback has produced a satisfactory system, the second phase, known as MLOps (machine learning operations), starts. In MLOps, the system is deployed and integrated into the final product. But it doesn’t stop there. Rather, MLOps provides the framework for the regular monitoring, updating, and improvement of a system in production. Because modern NLP is so driven by data, you need to make sure that your language models are regularly re-trained on textual data that captures your real-world use case.

Both the prototyping and the MLOps phase therefore heavily feature the testing and evaluation of models, both quantitatively and qualitatively. Only by periodically testing your system on real-world data and with real users can you make sure that it remains up to date. 

Evaluating natural language processing models

Some NLP models are harder to evaluate than others. For example, if a sentiment classifier labels a text as positive while the correct label is negative, then we can safely say that it made a mistake. However, when it comes to generating or extracting text, it isn't always so easy to say what's wrong and what isn't. Your summarization model may output a summary that's vastly different from the "correct" text — but it could be just as good, or even better. That's why it's important to evaluate NLP models not only quantitatively, but also qualitatively — by having real-world users test them.

The importance of user feedback

Rather than following the sequential data science model of finalizing a system before deploying and sharing it with end users, applied NLP makes user feedback an essential element of the production process. In the old model, it could very well happen that after months of development, you would realize that your final product wasn’t solving your users’ actual problem. By involving an example group of end users early on, you can analyze their feedback to improve your system — for instance, by collecting and annotating new, more representative data to fine-tune your models.

The steps of the natural language processing implementation process

Let’s now take a bird’s-eye view of the different phases of the NLP implementation process. 


  • Design your system pipeline. Here, you think about what you want your system to achieve. Does it need to extract answers from a large collection of documents? Does it need to translate queries or outputs? Consider potential future use cases as well, so that the pipeline will be flexible and capable of scaling up with your project.
  • Choose the models. The design decisions made in the previous step determine which kinds of language models are suitable for your system. In this phase, you’ll want to visit the model hub and decide on some models that you want to test in your pipeline.
  • Build the prototype. Set up the pipeline with your language models and connect it to your database. Now you already have a running system! If you’re looking to evaluate different language models, then you’ll likely build several pipeline prototypes at this stage.
  • Experiment, evaluate, and test. This phase serves to find the best setup for your use case and the resources you have available. Is the inference fast enough? Accurate enough? What do your users think about it? You need to deploy a preliminary system, get it into the hands of your end users, and collect their feedback, in addition to evaluating the system on a representative evaluation dataset. The more energy and resources you pour into this step, the better your system will become. 
  • Fine-tune your models. Collect real-world data for your use case, annotate it, and use it to fine-tune the models for higher accuracy. This step is followed by another iteration of experimentation, evaluation, and testing.


  • Deployment to production and integration. At this stage, you hand your prototype pipeline over to a back-end engineer, who will go on to deploy the system to production. It can then be integrated into the final product. 
  • Monitoring your system. After implementation, MLOps takes care of monitoring the product, periodically checking the system’s performance against updated evaluation data sets, as well as collecting feedback from real users. It’s important to make sure that your system remains up to date and doesn’t decay. If you notice a dip in your system’s quality, you’ll likely want to go back to collecting and annotating new data, which will help you fine-tune your language models anew. 

Where prototyping and MLOps meet

So, in a way, the final phase consists of periodically repeating the steps outlined earlier — only now, you’re performing them on a system that’s already been deployed to production.

The above diagram illustrates why, contrary to most people’s perceptions, building an NLP system has little to do with training a Transformer language model from scratch. Rather, the hard work in applied NLP is about making sure that you have all the parts you need: high-quality, annotated data, and a framework that will help you with setting up pipelines, connecting to the resources where pre-trained models are stored, and fine-tuning those models on your own data.

While the process of how to implement NLP in its entirety may seem a bit overwhelming, it helps to break it down into neatly defined steps, as we have done here.

Continue reading

This blog post has been adapted from our ebook “NLP for Developers” — a detailed guide to the entire NLP implementation process that fills in the gaps for developers aiming for the successful implementation of an NLP system. In the ebook, we address questions such as: 

  • Do I need to understand all the intricacies of the Transformer model architecture? 
  • How do I build a prototype that can be scaled to a production-ready system? 
  • How can I compare different pipeline architectures and hyperparameter settings to find those that work best for me? 
  • How can I adapt an existing language model to my use case? 
  • How can I collaborate with back-end engineers on my team towards the final deployment of the NLP system? 

Plus, we talk in depth about topics like data collection and annotation, evaluation, and the modularity of modern NLP systems. If that sounds good to you, follow this link to download the ebook for free. 

To learn more about the state of NLP today, check out our blog, and have a look at our first ebook “NLP for Product Managers,” which is lighter on the technical content and includes many real-world use cases of applied NLP.

When we’re not sharing knowledge about applied NLP through our blog or ebooks, we’re developing Haystack, an open source framework for applied NLP that aims to make the process as smooth and easy as possible, no matter your background. Have a look at the GitHub repository or the Haystack documentation page.