- Overview
- Document Processing Contracts
- Release notes
- About the Document Processing Contracts
- Box Class
- IPersistedActivity interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ContentValidationData Class
- EvaluatedBusinessRulesForFieldValue Class
- EvaluatedBusinessRuleDetails Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- FieldValue Class
- FieldValueResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- Rule Class
- RuleResult Class
- RuleSet Class
- RuleSetResult Class
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- FieldValueDetails Class
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding
- Release notes
- About the Document Understanding activity package
- Project compatibility
- Configuring external connection
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- Release notes
- About the IntelligentOCR activity package
- Project compatibility
- Configuring Authentication
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Document Understanding Project Classifier
- Intelligent Keyword Classifier
- Create Document Classification Action
- Create Document Validation Artifacts
- Retrieve Document Validation Artifacts
- Wait For Document Classification Action And Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- Document Understanding Project Extractor
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Create Document Validation Action
- Wait For Document Validation Action And Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release notes
- About the OCR Contracts
- Project compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded

Document Understanding Activities
Release notes
Release date: April 16, 2025
Extract Document Data by the classified Document Type: Using an extractor based on the result of the classification operation
DocumentData.DataType
. In the list of extractors, choose Use
Classification Result for the suggested extractor. Visit Extract Document Data for more
information about using classification results to select an extractor.
To provide extraction capabilities tailored to a category of documents, you can now use a dedicated modern project type called Generative Predefined. This project type offers the following extractors:
- Short Documents Simple Layout – This is the existent Generative Extractor.
- Long Documents Complex Layout – Optimized for long form documents that include images, handwriting, form elements or other complex layouts, such as floating callout boxes. Examples of documents that are suitable for this extractor: insurance policies, or other similar long form documents with complex layouts.
- Short Document Complex Layout – Optimized for short documents that include images, handwriting, form elements or other complex layouts like floating callout boxes. For example: identity cards, or healthcare intake forms with complex layouts.
For more information visit Extract Document Data.
To consume a snapshot of a model, activities and APIs now allow you to consume certain versions of your projects during classification and extraction. The Tag and Version properties provide granular control for consuming a version of a published model. For more information about consuming versions with DocumentUnderstanding.Activities , visit Classify Document and Extract Document Data. For information about exposing the version in your project, visit Document Details. For information on the newly available APIs for consuming tags and versions, visit Discovery APIs and Digitization APIs.
You can now use the Classify Document and Extract Document Data activities even if the robot is connected to a local Orchestrator. At design-time, you can use Document Understanding resources from different organizations or tenants. Similarly, at runtime, you can execute these activities while connected to a local Orchestrator in Studio.
The new Design-time external connection and Runtime external connection properties allow you to directly use external application credentials, or credentials stored in Orchestrator, to access Document Understanding resources during design-time or runtime.
- For the Generative Predefined project, the existent Generative Extractor is now called Long Document Simple Layout Extractor.
- When selecting an extractor for
the Generative project types, the former Prompt collection in the
Classify Document and Extract Document Data activities, has been renamed to
Document Type details, which describes the purpose of the collection
more accurately.
- Inside the updated Document Type details collection, the former Generative prompt column is now Instruction. The Instruction field represents the instructions you want to offer about the information that should be extracted for a certain field name.
For more information about the updates, check the Classify Document and Extract Document Data activities.
- The digitization part of your document processing workflows can now recognize and return data from documents containing values that are circled, underlined, or crossed out.
Release date: November 19, 2024
A long running classification and extraction process failed due to a default timeout. We are now considering the sum of all timeouts set in the activities as a global timeout for all operations behind the scene. Additionally we improved the error message for better clarity on the cause of failure in this specific situation.
Release date: October 3, 2024
Disable the generation of Document Data on demand, for enhanced flexibility in advanced implementations
False
, the output type changes from
IDocumentData<ExtractorType>
to
IDocumentData<DictionaryData>
. Setting Generate Data
Type to Flase
simplifies retrieving and changing field
values, and allows you to change the document type in the Validation Station.
Visit the Extract Document Data and Document data pages to check how to use the Generate Data Type property and what methods you can use to access the extraction results.
You can now set the Orchestrator storage bucket to work with Additional options for the Create Classification Validation Task and Create Classification Validation Task and Wait activities. If there is no specific storage bucket created, you can create a default one.
- Running a Studio Web workflow on a Mac robot failed with the following exception: "Could not load file or assembly 'UiPath.DocumentUnderstanding.Common.SDK'".
- The activities responsible for creating Action Center tasks crashed when manually correcting certain numbers or dates in a document.
- Fixed an error that occurred in Studio Web when resuming a workflow. Previously, this error happened after the first validation of a document classification, which involved retrieving files from Microsoft OneDrive and validating them in Action Center.
Release date: October 3, 2024
We've improved product stability by updating our common dependencies to the most recent versions. This upgrade is automatic and doesn't require any action from your side.
Release date: 5 June 2024
We've improved product stability by updating our common dependencies to the most recent versions. This upgrade is automatic and doesn't require any action from your side.
Release date: 27 May 2024
- Increased prompt size from 500 to 1000 characters per question for enhanced clarity in your instructions. Also, if you reach the prompt size limit of 1000 characters per question, you will receive a "Limit exceeded" error.
- Enhanced the error messages for the Extract Document Data activity for increased clarity and easier debugging.
Release date: 29 April 2024
These release notes contain all the updates made between November 2023 and March 2024.
UiPath.DocumentUnderstanding.Activities package.
Enhancing extraction confidence for Extract Document Data
We've improved our Extract Document Data activity to increase score accuracy and decrease validation time. This enhancement the Auto-validation and Confidence threshold properties, enabling the cross-verification of extraction results from specific models against a generative model.
Visit Extract Document Data to learn how to increase your extraction confidence levels.
Classification Validation tasks
We are excited to announce that the following activities are now available:
- Create Classification Validation Task: Allows you to create a validation task in Action Center.
- Create Classification Validation Task and Wait: Allows you to create a validation task in Action Center and pause the workflow until completion.
- Wait for Classification Validation Task and Resume: Allows you to suspend the execution of the current workflow until a specified document validation action is completed.
- You can now set the Orchestrator storage bucket to work with Additional options for the Create Validation Task and Wait and Create Validation Task activities. If there is no specific storage bucket created, you can create a default one.
- The optional Timeout property is now available for the Extract Document Data and Classify Document activities. This parameter configures a timeout for the activities.
- The selected extractor in the Extract Document Data activity now overrides the document type. Doesn't apply to generative models.
- In case of multi-value fields, all values are returned under Document
Data for the Extract Document Data activity. The values are available in
DocumentData.Data.FieldName.MultiValues[]
. - This release brings the following
updates to the Document Data object:
- The Name property
from the Document Type attribute is replaced with the
following:
- DisplayName for custom models
- ID for out-of-the-box models
- Two new properties are
added, populated from the result of the Document Understanding
framework:
- ID
- DisplayName
- The Name property
from the Document Type attribute is replaced with the
following:
The existing Document Understanding Insights dashboards, currently in preview, no longer display data from the cross-platform DocumentUnderstanding.Activities package. They now only report data from IntelligentOCR.Activities workflows.
Data from the cross-platform DocumentUnderstanding.Activities are now reported in a separate, new Insights dashboard.
- The ClassificationResults
output property of the Create Classification Validation Task activity is renamed to DocumentData.
CAUTION:The
CreatedClassificationValidationTask.ClassificationResults
property won't be available after the upgrade if it's in your current workflow. - The output property
ExtractionResults of the Create Validation Task activity is
renamed to DocumentData.
CAUTION:If your workflow uses the
CreatedDocumentValidationTask.ExtractionResults
property, this will not be available after the upgrade. - Fixed an issue where the Wait
for Validation Task and Resume activity didn't recognize numbers in the
3.1342,7
format, causing an "Input string was not in correct format" error. All number formats are now fully supported.
Release date: 1 November 2023
- Classify Document activity
- Extract Document Data activity
Release date: 11 May 2023
- We've fixed a bug that was causing the Extract Document Data activity to stop loading when a template was used.
- We fixed a bug where
ActionCatalog
fields wouldn't work for Create Document Validation Action and Create Validation Task and Wait activities. - We fixed a bug that was causing an error when Classify Document activity was used in a workflow.
Release date: 11 May 2023
We fixed a bug where users would get an error when trying to use the value of a field extracted with the Extract Document Data activity and the value wasn't present.
Release date: 5 May 2023
The Extract PDF Text activity is the latest addition to the package, allowing you to extract all characters from a specified PDF file and store it in a string variable. When the Apply OCR option is enabled, it extracts the information using OCR, and when disabled it extracts the native content.
Two activities had their names updated and one activity has updated fields:
- Create Document Validation Task became Create Validation Task.
- Wait for Document Validation Task and Resume became Wait for Validation Task and Resume.
- Set PDF Password now offers more detailed field names, such as:
- New Manage Password
- New Open Password
- Current Manage Password
- Current Open Password
- v2.14.0
- Extract Document Data by the classified Document Type: Using an extractor based on the result of the classification operation
- What's new
- Enhanced extraction capabilities using new extractors
- Using tags and versions for referencing Document Understanding modern projects
- Support for activities from an on-premises setup
- Improvements
- v2.12.1
- Bug fixes
- v2.4.5
- Bug fixes
- v2.9.6
- Bug fixes
- v2.12.0
- What's new
- Disable the generation of Document Data on demand, for enhanced flexibility in advanced implementations
- Improvements
- Bug fixes
- v2.2.6
- v2.9.5
- Bug fixes
- v2.9.4
- Bug fixes
- v2.9.3
- Bug fixes
- v2.4.3
- v2.9.2
- v2.9.1
- Improvements
- Bug fixes
- v2.9.0
- What's New
- Improvements
- Known limitations
- Bug fixes
- v2.4.2
- Bug fixes
- v2.4.1
- Generative Features General Availability
- New Features and Improvements
- v2.4.0
- Document Understanding Activities general availability
- v2.2.4
- New features and improvements
- v2.2.3
- Bug Fixes
- v2.2.2
- Bug Fixes
- Known issues
- v2.2.1
- New features and Improvements