- 概述
- 文档处理合同
- 发行说明
- 关于文档处理合同
- Box 类
- IPersistedActivity 接口
- PrettyBoxConverter 类
- IClassifierActivity 接口
- IClassifierCapabilitiesProvider 接口
- 分类器文档类型类
- 分类器结果类
- 分类器代码活动类
- 分类器原生活动类
- 分类器异步代码活动类
- 分类器文档类型功能类
- ContentValidationData Class
- EvaluatedBusinessRulesForFieldValue Class
- EvaluatedBusinessRuleDetails Class
- 提取程序异步代码活动类
- 提取程序代码活动类
- 提取程序文档类型类
- 提取程序文档类型功能类
- 提取程序字段功能类
- 提取程序原生活动类
- 提取程序结果类
- FieldValue Class
- FieldValueResult Class
- ICapabilitiesProvider 接口
- IExtractorActivity 接口
- 提取程序有效负载类
- 文档操作优先级枚举
- 文档操作数据类
- 文档操作状态枚举
- 文档操作类型枚举
- 文档分类操作数据类
- 文档验证操作数据类
- 用户数据类
- 文档类
- 文档拆分结果类
- DomExtensions 类
- 页类
- 页面分区类
- 多边形类
- 多边形转换器类
- 元数据类
- 词组类
- 词类
- 处理源枚举
- 结果表格单元类
- 结果表值类
- 结果表列信息类
- 结果表类
- 旋转枚举
- Rule Class
- RuleResult Class
- RuleSet Class
- RuleSetResult Class
- 分区类型枚举
- 词组类型枚举
- IDocumentTextProjection 接口
- 分类结果类
- 提取结果类
- 结果文档类
- 结果文档范围类
- 结果数据点类
- 结果值类
- 结果内容引用类
- 结果值令牌类
- 结果派生字段类
- 结果数据源枚举
- 结果常量类
- 简单字段值类
- 表字段值类
- 文档组类
- 文档分类类
- 文档类型类
- 字段类
- 字段类型枚举
- FieldValueDetails Class
- 语言信息类
- 元数据输入类
- 文本类型枚举
- 类型字段类
- ITrackingActivity 接口
- ITrainableActivity 接口
- ITrainableClassifierActivity 接口
- ITrainableExtractorActivity 接口
- 可训练的分类器异步代码活动类
- 可训练的分类器代码活动类
- 可训练的分类器原生活动类
- 可训练的提取程序异步代码活动类
- 可训练的提取程序代码活动类
- 可训练的提取程序原生活动类
- 基本数据点类 - 预览
- 提取结果处理程序类 - 预览
- Document Understanding ML
- Document Understanding OCR 本地服务器
- Document Understanding
- 智能 OCR
- 发行说明
- 关于“智能 OCR”活动包
- 项目兼容性
- 加载分类
- 将文档数字化
- 分类文档作用域
- 基于关键词的分类器
- Document Understanding 项目分类器
- 智能关键词分类器
- 创建文档分类操作
- 创建文档验证工件
- 检索文档验证工件
- 等待文档分类操作然后继续
- 训练分类器范围
- 基于关键词的分类训练器
- 智能关键词分类训练器
- 数据提取作用域
- Document Understanding 项目提取程序
- Document Understanding 项目提取程序训练器
- 基于正则表达式的提取程序
- 表单提取程序
- 智能表单提取程序
- 文档脱敏
- 创建文档验证操作
- 等待文档验证操作然后继续
- 训练提取程序范围
- 导出提取结果
- 机器学习提取程序
- 机器学习提取程序训练器
- 机器学习分类器
- 机器学习分类训练器
- 生成分类器
- 生成式提取程序
- 配置身份验证
- ML 服务
- OCR
- OCR 合同
- OmniPage
- PDF
- [未公开] Abbyy
- [未列出] Abbyy 嵌入式

Document Understanding 活动
表单提取程序
UiPath.IntelligentOCR.Activities.DataExtraction.FormExtractor
描述
Due to licensing purposes, the Form Extractor activity requires an Internet connection to run the robot.
The Form Extractor is best suited for extracting, matching, and reporting specific information by analyzing the word's position inside the document, or detecting a signature. This activity can be used only together with the Data Extraction Scope activity. Handwritten text can also be detected if the Form Extractor activity is used along with the UiPath Document OCR activity.
项目兼容性
Windows - Legacy | Windows
配置
属性面板
常见
- “显示名称”- 活动的显示名称。
输入
- ApiKey - Specifies the API key of the account. The API Key field is automatically pre-populated if defined in local project settings or in the Document Understanding framework.
- Endpoint - The URL to UiPath® server. By default, the endpoint is
https://du.uipath.com/svc/formextractor. For more information, visit Document Understanding Public Endpoints. - MinOverlapPercentage - Specifies the minimum overlap area (in percentage) between a box in the document and a box in the template required to make an extraction. The percentage value can be set between
0and100. The default value is65. - Timeout - Specifies the amount of time (in milliseconds) to wait for a response from the server before an error is thrown. The default value is 100000 milliseconds (100 seconds).
其他
- “私有”- 选中后将不再以“Verbose”级别记录变量和参数的值。
备注:
Multiple templates can be defined for one Document Type. When the activity is run, the extractor selects the best matching template based on the information found on the first page.
模板管理器向导
允许您为分类中定义的文档类型创建、编辑、管理和导出/导入模板。
创建模板
-
在“数据提取作用域”内,向工作流添加“表单提取程序”活动。
-
Configure the extractor by selecting Manage Templates.
系统将打开“模板管理器”窗口。
Figure 1. Overview of the Template Manager wizard

-
Select Create Template for creating a new template. Figure 2. Overview of the Create a new template configuration fields
备注:If the UiPath.IntelligentOCR.Activities package has been updated to v5.1.0, then the ForceApplyOCR parameter has been replaced with the ApplyOcrOnPDF. Here is the compatibility between the old and new parameters:
- ForceApplyOCR = True is replaced by ApplyOcrOnPDF = Yes;
- ForceApplyOCR = False is replaced by ApplyOcrOnPDF = Auto;
- ForceApplyOCR = Empty is replaced by ApplyOcrOnPDF = Auto;
- ForceApplyOCR =
<user-defined variable>is replaced by ApplyOcrOnPDF = Auto.
The Apply OCR on PDF option establishes if the OCR process should be applied or not to PDF documents. Three options are available in the dropdown list: True, False, and Auto. If set to True, the OCR is applied to all PDF pages of the document. If set to False, only digitally typed text is extracted. The default value is Auto, determining if the document requires to apply the OCR algorithm depending on the input document. Each OCR engine comes with its own set of custom options. Visit OCR Engine for more details about all options available for each OCR engine. The default OCR engine is UiPath Document OCR.
-
从“文档类型”下拉列表中选择模板的文档类型。
备注:所有文档类型均基于分类。确保在项目文件夹中添加或创建分类。
-
在“模板名称”字段中添加模板的名称。选择反映文档版本或版式的相关名称。
-
Add the document's path in the Template document field. Navigate to the file's path by using the Browse option.
-
Select an OCR from the OCR Engine dropdown list, and configure it according to its needs.
-
Select Configure to trigger the template editing.
If you have already created a template, then it can be edited, exported, or removed. Delete and Export options become available only when at least one template is selected. The Edit and Remove options for an individual template are always available.
Figure 3. Animated image of selecting the Delete or Export options for a template

配置布尔值字段处理
For documents that include check boxes, you can add known synonyms for the Yes and No options, or you can start from a list compiled by UiPath® (select Add Recommended). These values are used for Boolean content interpretation, which is mapping a captured value to a Yes or No reported value.
Figure 4. Animated image showing the suggestion generated after selecting Add recommended for the Synonyms for Yes and Synonyms for No fields

The Case sensitive check box needs to be checked if the synonyms you have added are case sensitive.
导出和导入模板
You can import templates created and exported from other workflows. Use these features to share templates between projects. Once a document type is configured using the Form Extractor, you don't need to reconfigure the templates in a new implementation.
导出程序
以下是导出模板时需要遵循的步骤:
-
按照本页开头说明的步骤创建一个或多个模板。
-
选择要导出的模板。
-
Select an Export option:
-
Export with original filesExporting with original files attaches them to the export.
-
不带原始文件导出
Figure 5. The action of selecting the Export with original files options

-
-
使用所需名称保存模板的存档。
-
A message is displayed once the template is saved. Select OK.
Figure 6. The "X" template(s) successfully exported message
备注:If you cannot share the content of the documents you have built your templates on, then use the Export without original files option. You are still able to share and import the template archive in other projects, but you cannot edit or view them anymore.
If you want to edit the templates once imported in a different project, make sure to use the Export with original files option when exporting and then importing them.
导入程序
以下是导入模板时需要遵循的步骤:
-
选择“导入”。
Figure 7. The action of selecting Import in the Template Manager wizard

-
Select an archive. The import wizard appears and presents all document types and all templates available in the selected export archive. Select the templates you wish to import and choose the desired Import option:
-
带原始文件导入
-
Import without original filesFigure 8. The Import options in the Template Manager wizard
备注:- 导入模板时,将在项目的分类中自动创建文档类型。如果已经存在名称相同的文档类型,则通过将计数附加到文档类型名称来创建另一个文档类型。
- 如果要导入已导出但不包含原始文件的模板,或者您选择导入不包含原始文件的模板,则这些模板没有查看或编辑选项。
-
导入模板时的特殊情况
导入模板时,可能会发生几种特殊情况。以下列表说明了每种情况及其特殊性:
- New document type: If a new document type is imported, then a new field is added in the wizard configurator, informing you that a new template is to be created.
- Duplicate document type: If an identical document type is imported, then the following warning message appears: "This template already exists and it will be overwritten."
- Extended template: If a document type template that includes extra fields than the already existing one, is imported, then the following warning message appears: "This document type will be updated as follows: The following field(s) do not exist and will be created".
- Extended document type: If the user imports a document type that includes extra fields than the already existing one, then the following warning message appears: "This document type will be updated as follows: The following field(s) don't have configurations to import".
- Document type with identical name but different content: If the user imports a document type that has the same name as the existing one but different fields, then the following warning message appears: "This document type will be updated as follows":
- “以下字段不存在,系统将创建相应字段”
- “以下字段没有要导入的配置”
- Document type with missing table: If the user imports a document type that doesn't include a table, then the following warning message appears: "This document type will be updated as follows: The following field(s) don't have configurations to import."
- Document type with extended table: If the user imports a document type that includes a table with extra columns, then the following warning message appears: "This document will be updated as follows: The following field(s) do not exist and will be created".
- Document type with reduced table: If the user imports a document type that includes a table with missing columns, then the following warning message appears: "This document will be updated as follows: The following field(s) don't have configurations to import"
- Table template with different document types: If you import a document type template that includes a table with different document types, then a new template is created. If your taxonomy includes a table that has a field with a different document type, then the following message appears: "The field with id
xyzwas found both in the imported taxonomy as well in the existing taxonomy but their types are incompatible (either both should be tables or neither of them)."
模板编辑器向导
一般注意事项
The Template Editor is built on top of the functionality present in the Validation station. To access it, select Edit
for a template.
Visit Validation Station to learn about the basic usage of the Validation Station.
Besides the options available in the right part of the Validation Station screen, there are two options specific to the Template Editor:
:设置锚点选择模式;
:清除整个锚点选择。
When creating a new template, an explanation text appears when first opening the Template Editor. In case you want to access the text again, go in the document view section on the right side, select More Options, and then Show explanation text.
Figure 9. The action of showing the explanation text

Table information can be modified at cell or table level. Visit Present Validation Station for more information about how to configure tables at cell level and at table level.
配置锚点
从模板管理器打开模板编辑器后,即可以定义锚点,并且可以在“选择模式”选项中找到锚点。
定义或编辑页面级别模板时,尽管是可选的,但首先需要选择“第 1 页匹配信息”。仅对于固定表单模板,此步骤是必需的。
“第 1 页匹配信息”选项位于屏幕左侧,需要模板第一页中的文本输入(仅接受令牌),该文本始终位于该特定模板布局中的相同位置,并形成为特定文档类型定义的所有模板中唯一的字词图表(考虑词之间的相对距离和角度)。
换句话说,“第 1 页匹配信息”(以及所有其他“页面匹配信息”字段)相当于特定页面的“指纹”,广泛用于在运行时识别正确的匹配模板。
For this reason, for the Page 1 Matching Info field, it is strongly recommended to select 10 to 20 words, preferably longer, spread across the entire page area.
仅当您尝试从该特定页面提取数据且不再需要跨模板唯一性时,才必须填写其他“页面匹配信息”字段(每个模板页面一个字段)。如果不需要从特定页面提取任何字段,则不必定义该页面的页面级别匹配信息。
配置简单字段
For all fields other than tables, configuring the template consists of selecting a Custom Area and assigning it to a particular field.
对于固定表单配置,只能使用“自定义区域”选择来配置数据字段。
For a field you can define one or more such Custom Areas, using the Add button. If two or more Custom Areas are defined for a single field, then at runtime, if the field is defined in the Taxonomy as Single Value, all values are concatenated into a single reported value. If the field is defined as Multi Value, then each value is reported individually.
The icon beside each field indicates the type of supported selection: Tokens or Custom area.
Figure 10. Animated image showing the types of supported selections for sample fields

如果选择了空白区域,则所选内容将自动设置为“自定义区域”。如果在选定区域内检测到文本,则系统会要求您在“令牌”或“自定义区域”之间选择所需内容的类型。
使用验证站点的“选择模式”功能锁定您在“令牌”和“自定义区域”之间做出的选择。
配置表格
As mentioned above, there are fields where information can be added only by using Tokens (like the Page Matching Info fields) or only by using a Custom Area (like simple fields). For Table fields, you can do the following:
- Define each cell one by one, once the Table Editor is expanded - by adding Custom Area selection to each cell individually;
- 使用表格标记功能 - 通过标记表格区域、绘制行和列分隔符,然后将如此标记的表格分配给字段。确保提取的区域具有与模板区域相同的列数和行数。
要使用表格标记功能,请执行以下操作:
- Select More Options for the table field
- Select Extract new table.
- 选择要提取的表格。
- For every field above each table column, select the column name that you want it to represent. You can also choose to Extract header.
- Lastly, select Save new table.
Figure 11. Animated image of an example using the table markup functionality

锚点配置
定义要从中提取数据的自定义区域范围的一种独特方法是使用字段级别锚点。这些锚点使您能够根据字段级别配置提取数据,从而更灵活地定义表单提取规则。
Consequently, at run-time, the Form Extractor knows how to perform the following:
- 确定页面级别模板是否匹配,并根据其确定为最匹配的页面级别模板提取信息;
- 确定任何基于锚点的设置是否匹配,并根据这些设置在待处理文档中的应用提取信息;
- 计算所有可能匹配项的相应置信度分数,以便报告所有可用选项的最佳结果(概率最高的匹配项)。
创建新的锚点设置
-
Make sure you are in the Anchor Selection mode.
-
在值区域周围绘制一个方框。
-
使用以下方法之一为值区域选择标签(主锚点):
- 选择第一个单词,然后对所选内容的最后一个单词使用
Ctrl + Select。 - 选择,拖动,然后释放以捕获词范围。
备注:
标签只能包含同一视觉行中的连续词。
- 选择第一个单词,然后对所选内容的最后一个单词使用
-
选择将用于唯一标识您的标签的任何其他锚点。相同的选择原则也适用。
-
通过选择特定字段的“提取值”,将锚点结构分配给相应字段。
Figure 12. Example of creating multiple anchors for a field
备注:You can also use the previous examples from this page to learn how to create a template and define extraction areas and anchors.
编辑现有锚点设置
-
高亮显示您的锚点设置。
-
对其进行更改(根据需要删除任何锚点或标签,甚至是删除值区域,以及添加新元素等)。
-
Select More Options for a field anchor, and then use the Change Extracted Value option to update your field association. Figure 13. Example of changing the extracted value for a field
备注:- 如果删除目标区域,则会删除所有锚点,并且您需要重新开始。
- 如果删除标签(主锚点),则第一个锚点(按照创建顺序)将变为新标签。
删除现有锚点设置
要删除锚点设置,您可以使用以下选项之一:
-
Select More Options for a field anchor and use the Mark as Missing option for a saved value.
Figure 14. Example of using the Mark as Missing option to delete an anchor setting

-
Select More Options for a field anchor and use the Remove Value option, case of a list of anchors defined for a given field.
Figure 15. Example of using the Remove Value option to delete an anchor setting

混合和匹配配置
您可以为同一文档类型定义任意数量的模板。您可以拥有多个页面级别模板,同一个字段可以有多个锚点,模板甚至可以同时包含页面级别锚点和字段级别锚点。
- 定义字段级别锚点时,请确保标签靠近值区域,并且如果可以在同一个文档的多个位置找到相同的文本构造,则其他锚点会支持该标签。
- 标签和锚点越长,您得到的精度就越高。
- 值区域始终根据其相对于标签(主锚点)的相对位置来计算。请据此选择主锚点。
- 有了字段级别锚点,字段可以在模板内移动并仍被捕获,从而为更改文档版式提供了更大的灵活性。
Document Understanding 集成
The Form Extractor activity is part of the Document Understanding solutions. Visit the Document Understanding Guide for more information.