Unstructured and complex documents user guide

Last updated Nov 24, 2025

Extracting data from unstructured documents

The Unstructured and complex documents capability enhances the capability to handle complex unstructured documents, and uses generative AI to map fields and field groups as defined in the extraction schema and predict them with confidence and accuracy. This advanced feature is adept at extracting data from intricate elements like complex tables, charts, or graphs, and it structures the output effectively.

The process involves:

Reviewing initial model predictions.
Modifying prompt instructions iteratively based on review outcomes.
Annotating documents to gather ground truth for validation and to inform refining the performance of data extraction.

Extracting data from unstructured documents, such as contracts, long invoices, or other similar documents, requires a systematic and intelligent approach due to the variations in format, language, and layout.

The process begins with providing clear instructions that guide the extraction model in identifying, interpreting, and extracting relevant information. These instructions, often referred to as prompt engineering or extractions, play a critical role in ensuring that the AI model interprets and processes the content accurately.

These instructions include:

Defining target data fields, such as dates, names, amounts, clauses, and so on.
Providing contextual cues or examples for the AI to recognize similar patterns.

By combining detailed prompt engineering, iterative feedback, and the reasoning power of generative AI, this approach significantly improves the extraction of structured information from unstructured and variable documents.

Was this page helpful?

PREVIOUSIntroduction

NEXTBuilding and deploying models

Support and Services

Get The Help You Need

UiPath Academy

Learning RPA - Automation Courses

UiPath Forum

UiPath Community Forum

Trust and Security

Cookies Policy