- Overview
- Model building
- Model validation
- Model deployment
- Frequently asked questions

Unstructured and complex documents user guide
Introduction
UiPath® IXP offers multi-modal data classification and extraction that unlocks enterprise data at speed and scale. This allows you to process various document types through a set of capabilities you can choose from:
- Communications Data (Communications Mining)
- Structured and semi-structured documents (Document Understanding)
- Unstructured and complex documents (Generative Extraction)
For more details, check Capability types, Choosing the correct capability, and Accessing capabilities in the IXP guide.
Depending on the stage you are at, some of the common challenges when extracting data from complex unstructured documents are:
During model building and testing
- Access required to different LLMs to test on various document types, which are use cases.
- Difficult to quickly iterate on a user prompt and view prediction results.
- Access required to more parameters to optimize prediction results.
- Difficult to validate predictions and provide ground truth.
- Time-consuming to assess model and field-level metrics.
- Required to compare the impact of adjusting model parameters, for example, model, user prompts, chunk size, zero-shot, or few-shot.
- Required to build a visualization of performance metrics, for example, precision, recall, F1 score.
During production deployment
- No easy way to modify and maintain data schema for a use case.
- Little model governance, for example, the ability to revert to a previous version, or to select a particular version for production.
This section introduces the Unstructured and complex documents capability, which uses Generative Extraction to process complex, unstructured documents. This capability helps users to solve the challenges outlined in the previous section.
This capability is ideal for advanced document processing scenarios where data is not consistently formatted. Use this capability when:
-
Documents contain paragraphs of free-form text or complex elements, such as:
- Complex tables.
- Graphics.
- Charts.
- Checkboxes.
- Call-out boxes.
- Signatures.
- Handwriting, and so on.
-
You need to extract inferred values, which are information that is not stated directly but must be derived from context.
-
There is high variation in layout or structure between documents or within fields to be extracted.
-
You are dealing with stacks of multiple document types combined as one file and need to extract data without splitting them up first.
For more details on the Unstructured and complex documents capability, check Choosing the correct capability in the IXP overview guide.
- Enhanced support for complex data such as tables and long unstructured documents.
- Reduced time to production.
- More configurable and flexible experience with controls for prompts, LLMs, and model settings.
- Improved model performance evaluation.
- AI guardrails and production-grade model governance.
The Unstructured and complex documents capability is suitable for various scenarios, across several industries, and includes the following, and many more across all verticals and departments:
Legal services
- Employment agreements
- Operating agreements
- Investment agreements
Healthcare
- Physician statements
- Emergency room reports
- Patient referrals
Retail
- Refund requests
- Customer complaints
- Product catalogues
Real estate
- Leases
- Mortgages
- Property appraisal reports
Finance or banking
- Brokerage statements
- Loan applications
- Credit reports
Manufacturing
- Change orders
- Product specifications
- Supply orders
Insurance
- Insurance policies
- Claims
- Coverage denial letters
Technology or telecommunication
- Service agreements
- Software licences
- Incidents reports