Intelligent Document Processing with LLMs

Creating high-quality reports and portfolios using large language models: Three examples of LLM-powered IDP in the real world

Intelligent Document Processing (IDP) is a collection of software tools that organize and process enterprise documents to extract information and provide actionable insights. IDP has long helped organizations efficiently sort through mountains of documents, such as reports, contracts, and emails, to retrieve key data. When implemented successfully, it can significantly speed up document-centric workflows, free up staff for more strategic work, and reduce the risk of data entry errors.

Large language models (LLMs) take IDP to the next level because their ability to both understand and generate natural language allows them to take a more active and comprehensive role than previous IDP technology. By grasping the context and meaning of documents and recognizing patterns in natural language, LLMs can accurately extract and abstract information from hundreds of pages and even distill their essence into concise memos. In addition, the visual capabilities of LLMs are constantly improving, enabling them to analyze and interpret not only text, but also tables, images, and more.

LLMs have the potential to make IDP faster, more accurate and more insightful than ever before. Organizations with document-heavy use cases need to understand how to incorporate LLM-based IDP into their workflows.

The novelty and versatility of LLMs as a technology can make it difficult to understand how exactly they can simplify workflows and accelerate processes related to IDP. This blog post sheds light on the many ways in which LLM technology can power intelligent document processing – inspired by real use cases from our customers, many of whom started using LLMs for IDP extensively.

How LLMs Augment IDP

LLMs are capable of handling complex documents, such as financial reports or legal contracts, that can run to hundreds of pages. They have strong natural language understanding capabilities and can produce text that is comparable to human writing. When combined with clever prompting, these tools can take on the role of a human analyst and quickly sift through large collections of documents to identify relevant data. They can then use that information to fill in predefined templates, summarize the key points of a document, and even generate reports or portfolios based on the extracted information.

LLMs can significantly speed up the task of processing many or long documents at once and extracting specific information from them, whether it is hard facts, high-level summaries, or even creating a document based on a template.

Report Generation with Query Sets

Industries that require in-depth analysis, such as finance, insurance, healthcare, and legal institutions, are ideal for LLM-powered IDP. Human analysts in these industries typically sift through data trying to answer predetermined guiding questions. For example, they may want to find out whether a company has sufficient financial reserves to adequately cover all projected customer claims and liabilities for the coming year.

The most common approach to using LLMs for IDP is to spell out these implicit questions in a set of queries, usually at a much more granular level. The LLM then uses these queries to navigate through the documents, identifying the most appropriate answers in the data. Based on the answers found, a report, summary or portfolio is generated. The report can be generated by the human analyst, the LLM itself, or a combination of both in a human-in-the-loop setup.

IDP with LLMs: Three Real-World Examples

Let us now take a look at three examples from industries that have already implemented high-value applications of LLM-based IDP in production. The examples come directly from our user base.

Example 1: Auditing financial compliance

Insurance companies are required to prepare annual reports explaining their financial condition and submit them to a government agency. In the interest of the public, the agency then conducts audits to understand whether the insurance companies will be able to cover their expenses in the coming year. 

Before using LLMs, auditors for the government institution would examine the reports by manually reading through them. Now, the experts run hundreds of tests with predefined queries designed to measure an insurance company's liquidity. In response to each query, the LLM extracts the most relevant sections of the report and presents them to the analysts, who use their expertise to determine whether the extracted information is accurate. Not only does this solution speed up the very tedious process of finding the right information in the reports, it also allows for more in-depth analysis since the analysts can ask more questions than they could with manual processing alone.

Example 2: Private equity

A private equity (PE) fund must conduct due diligence before investing in a company. To assist in this process, the fund hires a team of lawyers. The lawyers review hundreds of documents provided by investment banks and then prepare a standardized report for the PE fund's review. Finally, an investment memorandum is presented to the investment committee for review. The process of digesting and analyzing all the data can take six to eight weeks before the investment is made.

The fund now uses LLMs to manage the large volume of documents to be reviewed in each case and to streamline its due diligence process. LLMs search the data for answers with the help of a query set defined by the human analysts and lawyers. Because LLMs are designed to process large amounts of data at incredible speeds, they can uncover insights that human reviewers might miss. LLMs can significantly increase efficiency, but human oversight is still required to ensure accuracy and adherence to the standard template. The analysts can then use the time saved by LLMs to create more concise and detailed reports.

Example 3: Real estate underwriting

A bank needs to perform due diligence on the properties it is lending against. Previously, analysts manually extracted relevant information from legal reports, leases, and environmental reports to underwrite the loan and present a proposal to their investment committee. The investment committee would then review the proposal, which is always presented in a standard format.

With the LLM approach to IDP, real estate banks and loan funds can now predefine underwriting criteria using standardized queries and run them against the dataset using a language model. As a result, they can present a term sheet to their borrowers faster, help them close deals ahead of the competition, and deliver returns to their clients sooner.

From Data to Decisions

As these three examples show, the potential applications of LLMs go far beyond gimmicky chat applications: When used properly, these models can extract information from large and complex documents, generate reports to summarize their findings, and assist human analysts in their work.

Today, LLMs can already handle rich data types, including images and tables, allowing them to process a wide variety of documents autonomously. This enables comprehensive analysis of large and diverse datasets, from financial reports to legal contracts, making these models ideal for IDP.

Our platform deepset Cloud has a new specialized workflow that lets our customers take advantage of the full range of LLM functionality in their IDP systems. Learn more about deepset Cloud or schedule a demo with our team.