- 概述
- 入门指南
- 构建模型
- 使用模型
- 模型详细信息
- Public endpoints for Automation Cloud and Test Cloud
- Public endpoints for Automation Cloud and Test Cloud Public Sector
- 1040 - 文档类型
- 1040 计划 C - 文档类型
- 1040 计划 D - 文档类型
- 1040 计划 E - 文档类型
- 1040x - 文档类型
- 3949a - 文档类型
- 4506T - 文档类型
- 709 - 文档类型
- 941x - 文档类型
- 9465 - 文档类型
- ACORD125 - 文档类型
- ACORD126 - 文档类型
- ACORD131 - 文档类型
- ACORD140 - 文档类型
- ACORD25 - 文档类型
- 银行对账单 - 文档类型
- 提单 - 文档类型
- 公司注册证书 - 文档类型
- 原产地证书 - 文档类型
- 支票 - 文档类型
- 儿童产品证书 - 文档类型
- CMS 1500 - 文档类型
- 欧盟符合性声明 - 文档类型
- 财务报表 - 文档类型
- FM1003 - 文档类型
- I9 - 文档类型
- 身份证 - 文档类型
- 发票 - 文档类型
- 发票 2 - 文档类型
- 澳大利亚发票 - 文档类型
- 发票中国 - 文档类型
- 希伯来语发票 - 文档类型
- 发票印度 - 文档类型
- 日本发票 - 文档类别
- 发票运输 - 文档类型
- 装箱单列表 - 文档类型
- 工资单 - 文档类型
- 护照 - 文档类型
- 采购订单 - 文档类型
- 收据 - 文档类型
- 收据 2 - 文档类型
- 日本收据 - 文档类型
- 汇款通知书 - 文档类型
- UB04 - 文档类型
- 美国抵押贷款平交披露 - 文档类型
- 公用事业账单 - 文档类型
- 车辆标题 - 文档类型
- W2 - 文档类型
- W9 - 文档类型
- 支持的语言
- Insights 仪表板
- 数据与安全性
- 日志记录
- 许可
- 如何
- 故障排除

Document Understanding 用户指南
重新训练提取程序
Feature availability depends on the cloud platform that you use. For details, refer to the Choosing the deployment type page.
您可以使用在验证站点中验证的文档,进一步提高模型的性能。
您可以使用以下活动重新训练文档经过处理的文档:
- UiPath.DocumentUnderstanding.Activities: all documents that were processed using this activity package and were validated in Validated Station are collected automatically and can be used for retraining.
- UiPath.IntelligentOCR.Activities (starting with version 6.25.0-preview): to retrain documents processed using this activity package, use the Document Understanding Project Extractor Trainer activity in your workflow. This allows documents to be collected for retraining purposes.
The Exceptions for review button is now always visible for the corresponding document type within the Build section. If no documents have been collected, the button remains available and displays a count of 0.
Collected documents are not automatically included in the training set. You need to review the documents and confirm their addition in the training set to retrain your model.
待审核的异常
按照此过程中的步骤,使用验证站点中的文档微调模型。
Documents collected for exceptions are stored for a period of 90, after which they are automatically deleted. Documents are not collected for validation tasks with a duration greater than 7 days.
-
Select the Exceptions for review button.
-
Check the exception documents from the Exceptions for fine-tuning menu.
每个文档都提供以下信息:
- File name: the file name of the document containing an exception.
- Status: the status of the document.
- Pages: the number of pages that the document contains.
- Project version: the project version that contains the document.
- Processed date: the date when the document was processed.
- No of extracted fields: the number of extracted fields for the document containing an exception.
- No of corrected fields: the number of modified fields during validation step.
- Validator name: the username of the person who validated the document.
备注:The Exceptions for fine-tunning list may include documents imported from other Document Understanding projects or environments.

-
从列表中选择你想要用于微调的文档。
每个文档都提供以下信息:
- All extracted fields are displayed, along with their model confidence. If the confidence value is N/A, this indicates that the field was not automatically extracted and was instead manually added by the validator.
- 更正的字段的提取字段名称旁边标记有黄点。
- 以下信息可用于所有更正的字段:
- Predicted value: the value predicted by the model.
- Corrected value: if changed manually, the value after validation. If the document type was not changed after validation, the value will be N/A.
- Reference: the original value that is highlighted on the document. This value is used for annotation if you decide to use the document for fine-tuning.
- 您可以轻松按置信度排序查看置信度低的字段,无需进行更正。
- 您也可以筛选出已更正的字段,专注于这类字段。

-
Choose Annotate if you want to further annotate this document or choose Use for fine-tuning if you want to use this document to retrain your model.
- Use for fine-tuning:
- 当文档是模型微调的理想样本、且所有字段在文档中引用正确时选择此项。
- The document is imported in the training with the Exception tag and all annotations confirmed. The document will be used for fine-tuning.
- Annotate:
- 如果文档是模型微调的良好示例,但存在一些验证错误,例如并非所有字段都在文档中正确引用,需要进一步更正,则请选择此选项。
- The document is imported in the training set with the Exception tag and all annotations unconfirmed. Annotations must be confirmed in the Build section in order for the document to be used to fine-tune the model.
- If you choose to further annotate your document, check the Annotate documents page for more information on how to annotate documents.
- Exclude:
- 当文档并非理想样本,且需将其从待审核异常列表中移除,而且在未来审核会话中无需审核该文档时选择此项。
- 您可以更改文档的状态以恢复更改。
Figure 1. Selected document used for fine-tuning or annotation

- Use for fine-tuning:
选择所有文档进行微调后,系统会使用验证站点中的新数据重新训练您的模型。
You can create a new project version and use the Compare model feature from the Measure section to compare the performance of your model.
[预览] 导出和导入重新训练候选对象
Document Understanding 允许您从一个环境导出重新训练候选对象,然后将其导入到另一个环境。
此设置通常用于以下场景:您维护开发 - 测试 - 生产环境结构,您的流程涉及在更高层次的环境(例如测试、UAT 或生产)中创建开发 Document Understanding 项目的副本。 在此安排中,重新训练文档会被收集到生产项目中,导出 - 导入功能允许您将文档带回开发环境以进行模型重新训练,然后将更新推回更高层次的环境。
导出重新训练候选对象
You can export retraining candidates directly from the Exceptions for review page by selecting the Export button. Select View exports to access the list of exported files.

您可以选择导出所有收集的文档或仅导出所选子集,具体取决于您喜欢如何管理审核流程,如下例所述:
-
您可以导出所有收集的文档,在计划导入文档并重新训练模型的环境中对其进行分类。
-
您可以在当前环境中完成审核,并仅导出要用于重新训练的特定文档。

导入重新训练候选对象
Importing is done from the Exceptions for review page. Imported documents are displayed in a similar manner to documents automatically collected from the Validation Station, with the To review status.
Imported documents are added as retraining candidates. For model fine-tuning, simply follow the same steps as for documents collected directly in a project. Remember to review and confirm the retraining candidates before adding them to the training set.