- 概述
- 入门指南
- 构建模型
- 使用模型
- 模型详细信息
- Public endpoints for Automation Cloud and Test Cloud
- Public endpoints for Automation Cloud and Test Cloud Public Sector
- 1040 - 文档类型
- 1040 计划 C - 文档类型
- 1040 计划 D - 文档类型
- 1040 计划 E - 文档类型
- 1040x - 文档类型
- 3949a - 文档类型
- 4506T - 文档类型
- 709 - 文档类型
- 941x - 文档类型
- 9465 - 文档类型
- ACORD125 - 文档类型
- ACORD126 - 文档类型
- ACORD131 - 文档类型
- ACORD140 - 文档类型
- ACORD25 - 文档类型
- 银行对账单 - 文档类型
- 提单 - 文档类型
- 公司注册证书 - 文档类型
- 原产地证书 - 文档类型
- 支票 - 文档类型
- 儿童产品证书 - 文档类型
- CMS 1500 - 文档类型
- 欧盟符合性声明 - 文档类型
- 财务报表 - 文档类型
- FM1003 - 文档类型
- I9 - 文档类型
- 身份证 - 文档类型
- 发票 - 文档类型
- 发票 2 - 文档类型
- 澳大利亚发票 - 文档类型
- 发票中国 - 文档类型
- 希伯来语发票 - 文档类型
- 发票印度 - 文档类型
- 日本发票 - 文档类别
- 发票运输 - 文档类型
- 装箱单列表 - 文档类型
- 工资单 - 文档类型
- 护照 - 文档类型
- 采购订单 - 文档类型
- 收据 - 文档类型
- 收据 2 - 文档类型
- 日本收据 - 文档类型
- 汇款通知书 - 文档类型
- UB04 - 文档类型
- 美国抵押贷款平交披露 - 文档类型
- 公用事业账单 - 文档类型
- 车辆标题 - 文档类型
- W2 - 文档类型
- W9 - 文档类型
- 支持的语言
- Insights 仪表板
- 数据与安全性
- 日志记录
- 许可
- 如何
- 故障排除

Document Understanding 用户指南
迁移传统项目
使用此页面中的说明迁移传统项目或基于 AI Center 的项目。迁移项目有两个主要步骤:
- 从传统项目或基于 AI Center 的项目中导出数据集。
- 将数据集导入新式项目。
当前限制
- Currently, importing datasets larger than 5000pages is not supported. Only the initial 5000 pages will be successfully imported, with any additional pages failing to do so. For example, if your dataset consists of 4999 pages and you try to import a document of 4 pages, the process will not succeed.
- 批次名称和相应的批次结果当前不可用。如果您的数据已组织为批次,则系统不会显示此信息,但会保存此信息。
- 不支持从 AI Center 导出。 仅支持从 Document Manager 导出。
从传统项目导出数据集
- 导航到要迁移的传统项目并将其打开。
- Go to the document type you want to export and select Open document type.
Figure 1. Open document type

- From the Filter documents drop-down list, select Training and validation set.
Figure 2. Training and validation set

- 选择“导出” 。
- Leave Current search results selected and fill in a name for your export job.
- Select Download.
Figure 3. Download export

从基于 AI Center 的项目中导出数据集
-
Open AI Center and navigate to the Data Labeling page.
-
Select the Data Labeling Session you want to migrate.

-
Once Document Manager is open, from the Filter documents drop-down list, select Training and validation set.
Figure 4. Training and validation set

- 选择“导出” 。
- Leave Current search results selected and fill in a name for your export job.
- Select Download.
Figure 5. Download export

导入数据集
- 导航到要向其中导入数据的项目,并将其打开。
- Select Add document type and create a new custom document type.
Figure 6. Add document type

- On the new custom document type, select Upload and choose the zip file of the classic project you exported. Wait for the upload to finish.
备注:
不支持从 AI Center 导出。 仅支持从 Document Manager 导出。
Figure 7. Upload processing

上传完成后,文档即可用于训练。
模型训练
Once the dataset is imported, the model training starts. After the training is complete, the model score is displayed. To check detailed model scores, select the score, and then Detailed model scores.

This action takes you to the Measure page where you can access detailed model metrics.
当使用相同的数据集训练 ML 两次时,您可以观察到略有不同的模型指标。发生这种情况的原因如下:
- Initialization: Machine learning uses optimization methods that need initial guesses to trigger the optimization algorithms. Different initial guesses during each training could lead to various outcomes due to the unpredictable nature of these algorithms.
- Random state: Some algorithms use randomness in their operations. For instance, when training a neural network, procedures like stochastic gradient descent and mini-batch gradient descent introduce randomness. Therefore, even with identical initial model parameters and datasets, the performance of models may vary in different runs.
- Regularization: Certain algorithms include a penalty term that encourages the model to maintain smaller weights. Due to the randomness involved, the model could operate with a different weight set each time.
但是,请务必注意,这些细微差别并不一定意味着一个模型优于或不如另一个模型。即使指标略有不同,只要差异不是很大,模型理解数据的能力基本上保持不变。此外,多次重复此过程并取平均值应该会产生类似的性能指标。
在文档类型管理器中更改基本模型
如果传统项目的模型结果与新式项目的模型结果之间存在重大差异,则可能是由于基础模型不同所致。要更改基础模型,请继续执行以下步骤:
-
Select the three-dot menu from your custom document type and choose Document type manager.

-
Navigate to the Settings tab.
-
Select the desired model from the Base model drop-down list.

-
After making your selection, select Save. To exit, select Back.
导出类型
For classic projects, there are various methods for exporting data. Not all types of exported data are compatible for importing into modern projects. To compare the model results across both project types,filter documents by Training and validation set and select Choose search results to export the dataset. For more information on each option, check the following table.
| 导出类型 | 导出数据 | 导入的数据会发生什么情况 |
|---|---|---|
| 当前搜索结果 | Exports the current filtered dataset. Use it together with the Training and validation set filter. | Documents tagged as training are used to train the model. Documents tagged as validation are used to measure the model performance. Tip: To compare model results between two project types, always export and import the dataset as Train and validation . |
| 全部已添加标签 | 从数据集中导出所有带批注的文档:
|
|
| 架构 | 导出字段列表及其各自的设置。 | 如果没有架构,则导入架构。如果已定义架构,则导入将失败。 |
| 全部 | 导出所有带注释和不带注释的文档。 |
|
导入架构
您可以将架构与数据集一起导入新式项目。请按照以下步骤导入架构:
- Create a custom document type in the Build section.
- 导入包含架构的 zip 文件。
备注:
- 架构导入仅限于没有预先存在架构的自定义文档类型。
- 如果您将架构导入已包含架构的文档类型,则导入将失败。
Migrate the automation workflow
Migrating from a classic DU project to a modern one in your RPA automation requires a single change: replace the ML Extractor Activity inside the Data Extraction Scope with a Document Understanding Project Extractor. No other activities need to change — digitization, validation, and training activities remain the same.
If your workflow uses document classification, also replace the existing classifier with a Document Understanding Project Classifier. See Migrate classification below.
Replace the ML Extractor Activity
- In your Studio project, open the Data Extraction Scope activity.
- Remove the existing ML Extractor Activity.
- Add a Document Understanding Project Extractor inside the Data Extraction Scope.
- Select Get or refresh extractor capabilities to open the configuration wizard.
- Under Design time credentials, enter your App Id, App Secret, and Tenant Url.
- Select Get Projects to load the list of available modern projects.
- For Project, select your desired modern project from the dropdown list.
- For Version, select a deployed version of the project. Alternatively, select a Tag linked to a specific version. Version and Tag are mutually exclusive.
- Select Get Capabilities.
- Make sure Update Activity Arguments is checked.
If you connect to a project in a different tenant, configure the Authentication properties of the activity — Runtime Credentials Asset and Runtime Tenant Url — to match the credentials used in the wizard.
For full configuration details, see Document Understanding Project Extractor.
Migrate classification
If your automation uses document classification, replace the existing classifier with a Document Understanding Project Classifier inside the Classify Document Scope. The configuration steps mirror those of the extractor: open the Configure Classifiers Wizard, enter your design time credentials, select your project and version or tag, then select Get Capabilities.
For full configuration details, see Document Understanding Project Classifier.