Document Understanding - Data Extraction Validation Overview

document-understanding

2022.10

false

Document Understanding User Guide

Data Extraction Validation Overview

After automatic data extraction, one optional (but highly recommended) step is that of extracted data validation.

This refers to a human review step, in which knowledge workers can review the automatically extracted results and correct them when necessary.

Using Data Extraction Validation ensures that the structured data now available is 100% correct.

It is strongly recommended to use the Data Extraction Validation components when:

you need 100% accuracy on the data,
you have no other way to double-check the automatically extracted information from other sources of truth
- e.g., you can check a certain Name or Address that equals a Name or Address already confirmed and existing in a database, etc.
you do not have sufficient synthetic checks you can use on data consistency
- e.g., you can check that line items add up to a total; you can check that an ID number checksum is correct, etc.
  Note:
  Our strong recommendation is that, if possible, to add the Validation step, if you need 100% accuracy.
  
  If this is not an option for all documents, then:
  - try to double-check as much of the information as possible
  - try to decide on specific confidence thresholds that the business use case can accept for certain fields
  - make sure to always check both Extraction Confidence as well as OCR Confidence for a given value before making your decision.

Validating the automatically extracted data can be done by a human input through the use of Validation Station.

The Validation Station is available both

as an attended activity, through the use of the Present Validation Station activity, or
as Action Center tasks, through the use of the Create Document Validation Action and Wait for Document Validation Action and Resume activities.

On this page

PREVIOUSData Extraction Validation

NEXTValidation Station