- Overview
- Introduction
- Extracting data from unstructured documents
- Building and deploying models
- Model building
- Model validation
- Model deployment
- Frequently asked questions

Unstructured and complex documents user guide
Extracting data from unstructured documents
The Unstructured and complex documents capability enhances the capability to handle complex unstructured documents, and uses generative AI to map fields and field groups as defined in the extraction schema and predict them with confidence and accuracy. This advanced feature is adept at extracting data from intricate elements like complex tables, charts, or graphs, and it structures the output effectively.
The process involves:
- Reviewing initial model predictions.
- Modifying prompt instructions iteratively based on review outcomes.
- Annotating documents to gather ground truth for validation and to inform refining the performance of data extraction.
Extracting data from unstructured documents, such as contracts, long invoices, or other similar documents, requires a systematic and intelligent approach due to the variations in format, language, and layout.
The process begins with providing clear instructions that guide the extraction model in identifying, interpreting, and extracting relevant information. These instructions, often referred to as prompt engineering or extractions, play a critical role in ensuring that the AI model interprets and processes the content accurately.
These instructions include:
- Defining target data fields, such as dates, names, amounts, clauses, and so on.
- Providing contextual cues or examples for the AI to recognize similar patterns.
By combining detailed prompt engineering, iterative feedback, and the reasoning power of generative AI, this approach significantly improves the extraction of structured information from unstructured and variable documents.