Intuitive QA in the Cockpit with Haystack

How Airbus built a combined table and text QA system that returns accurate results in under a second.

Aircraft are complex machines and flying them requires pilots to access detailed information quickly, often in highly pressured scenarios. Flight Crew Operating Manuals (FCOM) address all the issues a pilot might encounter. While FCOMs, which span thousands of pages, are often digitized, it is still incredibly hard for pilots to find the necessary information, especially in critical situations. 

To improve information discovery in the cockpit, Airbus’ artificial intelligence research unit wanted to build a more intuitive system with a flatter learning curve that would return highly accurate information in a shorter time. Leveraging open-source framework Haystack for applied natural language processing (NLP), the team built a complex question answering system that can easily extract answers from both plain text and tables in extensive manuals — pinpointing the right cell in a table within more than a thousand pages, and providing the correct answer in less than one second.

Managing Queries from the Cockpit

Airbus — a global leader in designing, manufacturing and delivering aerospace products — produces hundreds of manuals that help pilots manage a variety of queries and issues in the cockpit. Modern commercial aircraft detect issues through the use of sensors, and handle them in an automated fashion according to protocol. But sensors can fail, and there are problems that they can’t detect — such as a passenger being sick. In those cases, it is crucial for pilots to quickly find the correct solution. 

Currently, Airbus pilots use digitized manuals which they search using basic keyword-based functionalities. However, a keyword-based approach means that they need to know the exact words that are used to describe an issue. While straightforward for experienced pilots, others struggle to find the exact combination of words that will lead them to the correct result, which can waste valuable time, especially in critical situations. 

“In stressful situations especially, pilots need to get to the info as quickly as possible to be able to react in time. We wanted to assess whether a deep learning-based system for advanced question answering could reduce the retrieval time.” — Alexandre Arnold, AI research unit, Airbus

QA on Text and Tables

However, the team faced a challenging hurdle. Like many technical documents, FCOMs contain a lot of tables. During the project’s initial phase, the Airbus team realized that tables play such an integral part in the pilots’ information extraction process that their system wouldn’t be complete without a module that could do question answering on tables. 

Airbus’ plan for a complex composite QA system for text and tables posed two challenges:

  1. The final system wouldn’t know whether the answer to a given query could be found in a table or in a piece of text. 
  2. It would have to retrieve the relevant table or text from the collection of FCOM pages itself.

Data annotation and preparation

To tackle these challenges, the Airbus team leveraged the real-world experience of pilots — organizing an internal company hackathon where they annotated training data that could be used for fine-tuning a general language model for handling text.

To manage tables, they adapted a table QA model — which combines a deep language model’s capacity for understanding the semantics of natural language with the additional knowledge about how to navigate a table’s rows and columns to retrieve the right cell.

Proper annotation was especially important for both the text and tables. Context is key with FCOMS and different problems and solutions can be described in very similar terms. However, Haystack’s nodes — the modular building blocks which make up the final NLP system — made it easy to include and retrieve such information in the form of metadata that was added during preprocessing.

The solution

The final system runs pilots’ queries through both table QA and text QA pipelines. The results are then joined in the final node, which picks the correct answer based on the confidence values attached to the results. 

Results and outlook

The team at Airbus is excited about the results, especially about the performance of the table section of the pipeline. “These documents are so long and have so much information encoded in tables. Being able to pinpoint the right cell in a table within more than a thousand pages, and, on that basis, provide the right answer in less than one second – that is extremely valuable,” said Arnold. While the current system has not been evaluated in a systematic user study yet, Arnold said early feedback from operations teams has been “very encouraging.”

“The results are promising. While we still need to work on the robustness of such systems and the operational performance before any deployment, we now see the value and the potential of the technology. We want to make sure that we keep up to speed with the space of NLP, so that our organization can leverage it from day one, once it's really perfectly useful for operations. This is really a gold mine.” — Alexandre Arnold, AI research unit, Airbus