Document Understanding Modern Projects User Guide

DELIVERY:

Last updated Aug 21, 2025

Key concepts

Familiarize yourself with the core concepts around UiPath® Document Understanding^TM.

Active learning

Active learning is our modern approach to creating models for Document Understanding^TM.

Active learning provides an interactive experience where the learning algorithm can query the user to label data with the desired outputs. This process helps to reduce the time and data required to train a machine-learning model by up to 80%. AI is used to guide the process, which includes automatic annotation, typically the most time-consuming task. The model also provides expert recommendations to enhance accuracy using the most informative datasets.

Figure 1. How does Active Learning work

Using active learning, you can also monitor your automations through analytical capabilities.

Document types

A document type refers to the classification or categorization of a document based on its content, format, purpose, or other distinguishing factors. Some examples can include invoices, receipts, contracts, reports, medical records, legal documents, and others.

Some document types have highly structured content, while others mainly consist of free text. Based on this, documents are classified into three main formats:

Structured: documents designed to collect information in a specific format. For example, surveys, tax forms, passports, or licenses are all structured documents.
Semi-structured: documents that do not follow a strict format and are not bound to specified data fields. Semi-structured documents include invoices, receipts, uility bills, bank statements, and others.
Unstructured: documents that do not follow a specific or organized model. For example, contracts, leases, or news articles are all unstructured documents.

To learn more about document types, check the Document types section.

Generative AI

Generative AI is a form of AI technology that leverages machine learning (ML) models to create and generate new content, data, or information.

The key to most generative AI tasks are large language models (LLMs). These are ML models that are trained on a vast amount of text data, designed to generate human-like text. LLMs can also understand and respond to prompts by completing sentences or paragraphs in a human-like manner.

In the context of Document Understanding^TM, generative AI helps with:

Information extraction: generative AI models can be used to extract specific information from unstructured or semi-structured documents. For example, it can go through an invoice to retrieve details like date, billed amount, and company name.
Document classification: ML models are used to auto-categorize documents based on their content. These algorithms 'read' the document, understand its context, and can classify it into predefined categories.
Data validation: generative AI can check the output of the ML model whenever the confidence score is too low. If both the ML models (generative and specialized) have the same output, a human can skip validating that document. This can improve time spending validating documents, as well as improving the performing of your models by checking the output with the help of a second generative model.

ML models

ML models are like virtual assistants that have been trained to learn from data and make predictions or decisions. These models are essentially algorithms that learn to recognize patterns based on historical data. The more data they are exposed to, the better they can improve their predictions or decisions over time.

You can find several out of the box ML models in Document Understanding^TM. These models help you classify and extract any commonly occurring data points from semi-structured or unstructured documents, with no setup required.

Check the Pre-trained document types page for the full list of pre-trained models and their fields.

ML models can be trained on a majority of languages, as long as the OCR recognizes the document and text with high confidence.

Optical character recognition

Optical character recognition (OCR) is a special technology used to convert different types of documents, such as scanned paper documents, PDF files, or images taken by a digital camera, into editable and searchable data.

The accuracy of an OCR engine most oftenly depends on the quality of the original document. Clear, well-formatted text in a readable font typically produces the best output.

For more information on the languages supported by the OCR engines options provided by UiPath®, check the OCR Supported Languages page.

On this page