UiPath Documentation
activities
latest
false
重要 :
请注意,此内容已使用机器翻译进行了部分本地化。 新发布内容的本地化可能需要 1-2 周的时间才能完成。
UiPath logo, featuring letters U and I in white

Document Understanding 活动

上次更新日期 2026年4月22日

表单提取程序

UiPath.IntelligentOCR.Activities.DataExtraction.FormExtractor

描述

备注:

Due to licensing purposes, the Form Extractor activity requires an Internet connection to run the robot.

The Form Extractor is best suited for extracting, matching, and reporting specific information by analyzing the word's position inside the document, or detecting a signature. This activity can be used only together with the Data Extraction Scope activity. Handwritten text can also be detected if the Form Extractor activity is used along with the UiPath Document OCR activity.

项目兼容性

Windows - Legacy | Windows

配置

属性面板

常见

  • “显示名称”- 活动的显示名称。

输入

  • ApiKey - Specifies the API key of the account. The API Key field is automatically pre-populated if defined in local project settings or in the Document Understanding framework.
  • Endpoint - The URL to UiPath® server. By default, the endpoint is https://du.uipath.com/svc/formextractor. For more information, visit Document Understanding Public Endpoints.
  • MinOverlapPercentage - Specifies the minimum overlap area (in percentage) between a box in the document and a box in the template required to make an extraction. The percentage value can be set between 0 and 100. The default value is 65.
  • Timeout - Specifies the amount of time (in milliseconds) to wait for a response from the server before an error is thrown. The default value is 100000 milliseconds (100 seconds).

其他

  • “私有”- 选中后将不再以“Verbose”级别记录变量和参数的值。
    备注:

    Multiple templates can be defined for one Document Type. When the activity is run, the extractor selects the best matching template based on the information found on the first page.

模板管理器向导

允许您为分类中定义的文档类型创建、编辑、管理和导出/导入模板。

创建模板
  1. 在“数据提取作用域”内,向工作流添加“表单提取程序”活动。

  2. Configure the extractor by selecting Manage Templates.

    系统将打开“模板管理器”窗口。

    Figure 1. Overview of the Template Manager wizard

    “模板管理器”向导概览

  3. Select Create Template for creating a new template. Figure 2. Overview of the Create a new template configuration fields

    “新建模板”配置字段概览

    备注:

    If the UiPath.IntelligentOCR.Activities package has been updated to v5.1.0, then the ForceApplyOCR parameter has been replaced with the ApplyOcrOnPDF. Here is the compatibility between the old and new parameters:

    • ForceApplyOCR = True is replaced by ApplyOcrOnPDF = Yes;
    • ForceApplyOCR = False is replaced by ApplyOcrOnPDF = Auto;
    • ForceApplyOCR = Empty is replaced by ApplyOcrOnPDF = Auto;
    • ForceApplyOCR = <user-defined variable> is replaced by ApplyOcrOnPDF = Auto.

    The Apply OCR on PDF option establishes if the OCR process should be applied or not to PDF documents. Three options are available in the dropdown list: True, False, and Auto. If set to True, the OCR is applied to all PDF pages of the document. If set to False, only digitally typed text is extracted. The default value is Auto, determining if the document requires to apply the OCR algorithm depending on the input document. Each OCR engine comes with its own set of custom options. Visit OCR Engine for more details about all options available for each OCR engine. The default OCR engine is UiPath Document OCR.

  4. 从“文档类型”下拉列表中选择模板的文档类型。

    备注:

    所有文档类型均基于分类。确保在项目文件夹中添加或创建分类。

  5. 在“模板名称”字段中添加模板的名称。选择反映文档版本或版式的相关名称。

  6. Add the document's path in the Template document field. Navigate to the file's path by using the Browse option.

  7. Select an OCR from the OCR Engine dropdown list, and configure it according to its needs.

  8. Select Configure to trigger the template editing.

If you have already created a template, then it can be edited, exported, or removed. Delete and Export options become available only when at least one template is selected. The Edit and Remove options for an individual template are always available.

Figure 3. Animated image of selecting the Delete or Export options for a template

为模板选择“删除”或“导出”选项的动图

配置布尔值字段处理

For documents that include check boxes, you can add known synonyms for the Yes and No options, or you can start from a list compiled by UiPath® (select Add Recommended). These values are used for Boolean content interpretation, which is mapping a captured value to a Yes or No reported value.

Figure 4. Animated image showing the suggestion generated after selecting Add recommended for the Synonyms for Yes and Synonyms for No fields

图动图显示了为“Yes 的同义词”和“No 的同义词”字段选择“添加建议的值”后生成的建议

备注:

The Case sensitive check box needs to be checked if the synonyms you have added are case sensitive.

导出和导入模板

You can import templates created and exported from other workflows. Use these features to share templates between projects. Once a document type is configured using the Form Extractor, you don't need to reconfigure the templates in a new implementation.

导出程序

以下是导出模板时需要遵循的步骤:

  1. 按照本页开头说明的步骤创建一个或多个模板。

  2. 选择要导出的模板。

  3. Select an Export option:

    1. Export with original filesExporting with original files attaches them to the export.

    2. 不带原始文件导出

      Figure 5. The action of selecting the Export with original files options

      选择“使用原始文件导出”选项的操作

  4. 使用所需名称保存模板的存档。

  5. A message is displayed once the template is saved. Select OK.

    Figure 6. The "X" template(s) successfully exported message

    “X 个模板已成功导出”消息

    备注:

    If you cannot share the content of the documents you have built your templates on, then use the Export without original files option. You are still able to share and import the template archive in other projects, but you cannot edit or view them anymore.

    If you want to edit the templates once imported in a different project, make sure to use the Export with original files option when exporting and then importing them.

导入程序

以下是导入模板时需要遵循的步骤:

  1. 选择“导入”

    Figure 7. The action of selecting Import in the Template Manager wizard

    在“模板管理器”向导中选择“导入”的操作

  2. Select an archive. The import wizard appears and presents all document types and all templates available in the selected export archive. Select the templates you wish to import and choose the desired Import option:

    1. 带原始文件导入

    2. Import without original filesFigure 8. The Import options in the Template Manager wizard

      “模板管理器”向导中的“导入”选项

      备注:
      • 导入模板时,将在项目的分类中自动创建文档类型。如果已经存在名称相同的文档类型,则通过将计数附加到文档类型名称来创建另一个文档类型。
      • 如果要导入已导出但不包含原始文件的模板,或者您选择导入不包含原始文件的模板,则这些模板没有查看或编辑选项。

导入模板时的特殊情况

导入模板时,可能会发生几种特殊情况。以下列表说明了每种情况及其特殊性:

  • New document type: If a new document type is imported, then a new field is added in the wizard configurator, informing you that a new template is to be created.
  • Duplicate document type: If an identical document type is imported, then the following warning message appears: "This template already exists and it will be overwritten."
  • Extended template: If a document type template that includes extra fields than the already existing one, is imported, then the following warning message appears: "This document type will be updated as follows: The following field(s) do not exist and will be created".
  • Extended document type: If the user imports a document type that includes extra fields than the already existing one, then the following warning message appears: "This document type will be updated as follows: The following field(s) don't have configurations to import".
  • Document type with identical name but different content: If the user imports a document type that has the same name as the existing one but different fields, then the following warning message appears: "This document type will be updated as follows":
    • “以下字段不存在,系统将创建相应字段”
    • “以下字段没有要导入的配置”
  • Document type with missing table: If the user imports a document type that doesn't include a table, then the following warning message appears: "This document type will be updated as follows: The following field(s) don't have configurations to import."
  • Document type with extended table: If the user imports a document type that includes a table with extra columns, then the following warning message appears: "This document will be updated as follows: The following field(s) do not exist and will be created".
  • Document type with reduced table: If the user imports a document type that includes a table with missing columns, then the following warning message appears: "This document will be updated as follows: The following field(s) don't have configurations to import"
  • Table template with different document types: If you import a document type template that includes a table with different document types, then a new template is created. If your taxonomy includes a table that has a field with a different document type, then the following message appears: "The field with id xyz was found both in the imported taxonomy as well in the existing taxonomy but their types are incompatible (either both should be tables or neither of them)."

模板编辑器向导

一般注意事项

The Template Editor is built on top of the functionality present in the Validation station. To access it, select Edit 编辑 for a template.

Visit Validation Station to learn about the basic usage of the Validation Station.

Besides the options available in the right part of the Validation Station screen, there are two options specific to the Template Editor:

  • 锚点:设置锚点选择模式;
  • 清除锚点:清除整个锚点选择。

When creating a new template, an explanation text appears when first opening the Template Editor. In case you want to access the text again, go in the document view section on the right side, select More Options, and then Show explanation text.

Figure 9. The action of showing the explanation text

显示说明文本的操作

Table information can be modified at cell or table level. Visit Present Validation Station for more information about how to configure tables at cell level and at table level.

配置锚点

模板管理器打开模板编辑器后,即可以定义锚点,并且可以在“选择模式”选项中找到锚点。

定义或编辑页面级别模板时,尽管是可选的,但首先需要选择“第 1 页匹配信息”。仅对于固定表单模板,此步骤是必需的。

“第 1 页匹配信息”选项位于屏幕左侧,需要模板第一页中的文本输入(仅接受令牌),该文本始终位于该特定模板布局中的相同位置,并形成为特定文档类型定义的所有模板中唯一的字词图表(考虑词之间的相对距离和角度)。

换句话说,“第 1 页匹配信息”(以及所有其他“页面匹配信息”字段)相当于特定页面的“指纹”,广泛用于在运行时识别正确的匹配模板。

For this reason, for the Page 1 Matching Info field, it is strongly recommended to select 10 to 20 words, preferably longer, spread across the entire page area.

仅当您尝试从该特定页面提取数据且不再需要跨模板唯一性时,才必须填写其他“页面匹配信息”字段(每个模板页面一个字段)。如果不需要从特定页面提取任何字段,则不必定义该页面的页面级别匹配信息。

配置简单字段

For all fields other than tables, configuring the template consists of selecting a Custom Area and assigning it to a particular field.

对于固定表单配置,只能使用“自定义区域”选择来配置数据字段。

For a field you can define one or more such Custom Areas, using the Add button. If two or more Custom Areas are defined for a single field, then at runtime, if the field is defined in the Taxonomy as Single Value, all values are concatenated into a single reported value. If the field is defined as Multi Value, then each value is reported individually.

The icon beside each field indicates the type of supported selection: Tokens or Custom area.

Figure 10. Animated image showing the types of supported selections for sample fields

显示示例字段支持的选择类型的动图

备注:

如果选择了空白区域,则所选内容将自动设置为“自定义区域”。如果在选定区域内检测到文本,则系统会要求您在“令牌”或“自定义区域”之间选择所需内容的类型。

使用验证站点的“选择模式”功能锁定您在“令牌”和“自定义区域”之间做出的选择。

配置表格

As mentioned above, there are fields where information can be added only by using Tokens (like the Page Matching Info fields) or only by using a Custom Area (like simple fields). For Table fields, you can do the following:

  • Define each cell one by one, once the Table Editor is expanded - by adding Custom Area selection to each cell individually;
  • 使用表格标记功能 - 通过标记表格区域、绘制行和列分隔符,然后将如此标记的表格分配给字段。确保提取的区域具有与模板区域相同的列数和行数。

要使用表格标记功能,请执行以下操作:

  1. Select More Options for the table field
  2. Select Extract new table.
  3. 选择要提取的表格。
  4. For every field above each table column, select the column name that you want it to represent. You can also choose to Extract header.
  5. Lastly, select Save new table.

Figure 11. Animated image of an example using the table markup functionality

使用表格标记功能的示例动图

锚点配置

定义要从中提取数据的自定义区域范围的一种独特方法是使用字段级别锚点。这些锚点使您能够根据字段级别配置提取数据,从而更灵活地定义表单提取规则。

Consequently, at run-time, the Form Extractor knows how to perform the following:

  • 确定页面级别模板是否匹配,并根据其确定为最匹配的页面级别模板提取信息;
  • 确定任何基于锚点的设置是否匹配,并根据这些设置在待处理文档中的应用提取信息;
  • 计算所有可能匹配项的相应置信度分数,以便报告所有可用选项的最佳结果(概率最高的匹配项)。
创建新的锚点设置
  1. Make sure you are in the Anchor Selection mode.

  2. 在值区域周围绘制一个方框。

  3. 使用以下方法之一为值区域选择标签(主锚点):

    • 选择第一个单词,然后对所选内容的最后一个单词使用 Ctrl + Select
    • 选择,拖动,然后释放以捕获词范围。
      备注:

      标签只能包含同一视觉行中的连续词。

  4. 选择将用于唯一标识您的标签的任何其他锚点。相同的选择原则也适用。

  5. 通过选择特定字段的“提取值”,将锚点结构分配给相应字段。

    Figure 12. Example of creating multiple anchors for a field

    为一个字段创建多个锚点的示例

    备注:

    You can also use the previous examples from this page to learn how to create a template and define extraction areas and anchors.

编辑现有锚点设置
  1. 高亮显示您的锚点设置。

  2. 对其进行更改(根据需要删除任何锚点或标签,甚至是删除值区域,以及添加新元素等)。

  3. Select More Options for a field anchor, and then use the Change Extracted Value option to update your field association. Figure 13. Example of changing the extracted value for a field

    更改字段的提取值的示例

    备注:
    • 如果删除目标区域,则会删除所有锚点,并且您需要重新开始。
    • 如果删除标签(主锚点),则第一个锚点(按照创建顺序)将变为新标签。
删除现有锚点设置

要删除锚点设置,您可以使用以下选项之一:

  • Select More Options for a field anchor and use the Mark as Missing option for a saved value.

    Figure 14. Example of using the Mark as Missing option to delete an anchor setting

    使用“标记为缺失”选项删除锚点设置的示例

  • Select More Options for a field anchor and use the Remove Value option, case of a list of anchors defined for a given field.

    Figure 15. Example of using the Remove Value option to delete an anchor setting

    使用“移除值”选项删除锚点设置的示例

混合和匹配配置

您可以为同一文档类型定义任意数量的模板。您可以拥有多个页面级别模板,同一个字段可以有多个锚点,模板甚至可以同时包含页面级别锚点和字段级别锚点。

备注:
  • 定义字段级别锚点时,请确保标签靠近值区域,并且如果可以在同一个文档的多个位置找到相同的文本构造,则其他锚点会支持该标签。
  • 标签和锚点越长,您得到的精度就越高。
  • 值区域始终根据其相对于标签(主锚点)的相对位置来计算。请据此选择主锚点。
  • 有了字段级别锚点,字段可以在模板内移动并仍被捕获,从而为更改文档版式提供了更大的灵活性。

Document Understanding 集成

The Form Extractor activity is part of the Document Understanding solutions. Visit the Document Understanding Guide for more information.

此页面有帮助吗?

连接

需要帮助? 支持

想要了解详细内容? UiPath Academy

有问题? UiPath 论坛

保持更新