- Overview
- Getting started
- Building models
- Consuming models
- Model Details- Public endpoints
- 1040 - document type
- 1040 Schedule C - document type
- 1040 Schedule D - document type
- 1040 Schedule E - document type
- 1040x - document type
- 3949a - document type
- 4506T - document type
- 709 - document type
- 941x - document type
- 9465 - document type
- ACORD125 - document type
- ACORD126 - document type
- ACORD131 - document type
- ACORD140 - document type
- ACORD25 - document type
- Bank Statements - document type
- Bills Of Lading - document type
- Certificate of Incorporation - document type
- Certificate of Origin - document type
- Checks - document type
- Children Product Certificate - document type
- CMS 1500 - document type
- EU Declaration of Conformity - document type
- Financial Statements - document type
- FM1003 - document type
- I9 - document type
- ID Cards - document type
- Invoices - document type
- Invoices2 - document type
- Invoices Australia - document type
- Invoices China - document type
- Invoices Hebrew - document type
- Invoices India - document type
- Invoices Japan - document type
- Invoices Shipping - document type
- Packing Lists - document type
- Payslips - document type
- Passports - document type
- Purchase Orders - document type
- Receipts - document type
- Receipts2 - document type
- Receipts Japan - document type
- Remittance Advices - document type
- UB04 - document type
- US Mortgage Closing Disclosures - document type
- Utility Bills - document type
- Vehicle Titles - document type
- W2 - document type
- W9 - document type
 
 
- Supported languages
- Data and security
- Licensing and Charging Logic
- How to
- Troubleshooting

Document Understanding User Guide for Automation Cloud Dedicated
You can check the overall status of your project and check the areas with improvement potential from the Measure section.
The main measurement on the page is the overall Project score.
This measurement factors in the classifier and extractor scores for all document types. The score of each factor corresponds to the model rating and can be viewed in Classification Measure and Extraction Measure respectively.
- Poor (0-49)
- Average (50-69)
- Good (70-89)
- Excellent (90-100)
Regardless of the model score, it is up to you to decide when to stop training, depending on your project needs. Even if a model is rated as Excellent, that doesn't mean that it will meet all business requirements.
The Classification score factors in the performance of the model as well as the size and quality of the dataset.
- Factors: Provides recommendations on how to improve the performance of your model. You can get recommendations on dataset size or trained model performance for each document type.
- Metrics: Provides useful metrics, such as the number of train and test documents, precision, accuracy, recall, and F1 score for each document type.
The Extraction score factors in the overall performance of the model as well as the size and quality of the dataset. This view is split into document types. You can also go straight to the Annotate view of each document type by selecting Annotate.
- Factors: Provides recommendations on how to improve the performance of your model. You can get recommendations on dataset size (number of uploaded documents, number of annotated documents) or trained model performance (fields accuracy) for the selected document type.
- Dataset: Provides information about the documents used for training the model, the total number of imported pages, and the total number of labelled pages.
- Metrics: Provides useful information and metrics, such as the field name, the number of training status, and accuracy for the selected document type. You can also access advanced metrics for your extraction models using the Download advanced metrics button. This feature allows you to download an Excel file with detailed metrics and model results per batch.
Dataset diagnostics
The Dataset tab helps you build effective datasets by providing feedback and recommendations of the steps needed to achieve good accuracy for the trained model.
There are three dataset status levels exposed in the Management bar:
- Red - More labelled training data is required.
- Orange - More labelled training data is recommended.
- Light green - Labelled training data is within recommendations.
- Dark green - Labelled training data is within recommendations. However, more data might be needed for underperforming fields.
If no fields are created in the session, the dataset status level is grey.
You can compare the performance of two versions of a classification or extraction model from the Measure section.
Classification model comparison
To compare the performance of two versions of a classification model, first navigate to the Measure section. Then, select Compare model for the classification model you are interested in.
You can choose the versions you want to compare from the drop-down list at the top of each column. By default, the current version, indicating the most recent version available, is selected on the left and the most recent published version on right.
- Precision: the ratio of correctly predicted positive instances to the total instances that were predicted positive. A model with a high precision indicates fewer false positives.
- Accuracy: the ratio of correct predictions (including both true positives and true negatives) out of total instances.
- Recall: the proportion of actual positive cases that were correctly identified.
- F1 score: the geometric mean of precision and recall, aiming to strike a balance between these two metrics. This serves as a trade-off between false positives and false negatives.
The order of document types displayed is the one used in the latest version from the comparison. If a document type is not available in one of the compared versions, the values for each measure are replaced with N/A.
Extraction model comparison
To compare the performance of two versions of an extraction model, first navigate to the Measure section. Then, select Compare model for the extraction model you are interested in.
You can choose the versions you want to compare from the drop-down list at the top of each column. By default, the current version, indicating the most recent version available, is selected on the left and the most recent published version on right.
- Field name: the name of the annotation field.
- Content type: the content
                           type of the field:
                           - String
- Number
- Date
- Phone
- ID Number
 
- Rating: model score intended to help you visualize the performance of the extracted field.
- Accuracy: the fraction of the total number of predictions that the model makes that are correct.
The order of field names displayed is the one used in the latest version from the comparison. If a field name is not available in one of the compared versions, the values for each measure are replaced with N/A.
You can also compare the field score for tables from the Table section.
You can download the advanced metrics file for each version from the comparison page from the Download advanced metrics button.