UiPath Documentation
activities
latest
false
重要 :
请注意,此内容已使用机器翻译进行了部分本地化。 新发布内容的本地化可能需要 1-2 周的时间才能完成。
UiPath logo, featuring letters U and I in white

Document Understanding 活动

上次更新日期 2026年4月22日

基于正则表达式的提取程序

UiPath.IntelligentOCR.Activities.DataExtraction.RegexBasedExtractor

描述

Enables you to create and use a custom Regular Based Expression to extract information from a document. This activity can be used only together with the Data Extraction Scope activity.

备注:

This activity cannot work with set or boolean fields.

项目兼容性

Windows - Legacy | Windows

配置

设计器面板

Configure Expressions - Opens the Configure Regular Expressions wizard.

属性面板

常见

  • “显示名称”- 活动的显示名称。

输入

  • Configuration - Specifies the configuration value for the extractor as a JSON escaped string. Use the extractor wizard to generate the configuration. You can keep the configuration in the Properties panel, as a string, or you can define it by using the wizard and bind it to a variable. It is advisable to edit the Configuration field by using the wizard and not the Properties panel.
  • Timeout - Specifies the timeout value for any Regex search, in milliseconds. A timeout of 0, or negative, is interpreted as infinite. The default value is 2000.
  • UseVisualAlignment - If selected, the regular expressions are applied to a text version generated based on visual word alignments (a visual word alignment includes words separated by a single space character, lines separated by a single newline character, and pages separated by two lines characters). The default value is False. This option can be used for complex layouts where it is easier for users to write regular expressions based on how words are visually organized on lines, ignoring any sentence, paragraph, or layout group otherwise identified in the document.

其他

  • “私有”- 选中后将不再以“Verbose”级别记录变量和参数的值。

使用配置正则表达式向导

  1. 在“数据提取作用域”活动中,向工作流添加“基于正则表达式的提取程序”活动。

  2. Configure your regular expressions by selecting Configure Expressions. The Wizard window opens.

    Figure 1. Overview of the Configure Regular Expressions wizard

    “配置正则表达式”向导概览

  3. 展开文档类型条目,以查看所有已定义的字段,并开始配置您的正则表达式。系统会自动从项目的分类中读取文档类型及其相应字段。“正则表达式”配置选项可用于分类中的每个字段。检查您可能会在向导中遇到的以下配置选项:

    • You can have a document type, that, when you expand it, a single regular field is displayed. For a simple field, only a single regular expression can be defined using the Configure Regular Expressions wizard that opens when you select Edit next to that field.

    Figure 2. A document type in the Configure Regular Expressions wizard that has a regular field defined

    “配置正则表达式”向导中定义的常规字段的文档类型

    • You can have a document type, that, when you expand it, a table field is displayed, showing configuration options for a table, such as Expression for the entire table content, or an Expression for individual rows. Check the following list for the multiple settings and options available for a table field configuration:
      • The Table Value RegEx can be used for capturing an entire table area. If no value is added in the Table field line, the entire text content of the document is considered onward for table processing.
      • The Rows Value RegEx can be used for capturing an entire row from a given table capture. If no value is added in the Rows field line, the table area is split by end-of-line. Each captured value is considered from this point forward as a row on which the column extraction is to be applied.
      • “列值正则表达式”可用于从每个捕获的行中捕获特定列的值。

    Figure 3. A document type in the Configure Regular Expressions wizard that has a table field defined

    “配置正则表达式”向导中定义的表格字段的文档类型

使用表格、行和列正则表达式的场景

查看以下可能发生的场景,以使用可用的表格正则表达式选项:

  • 如果您将“表格正则表达式”和“行正则表达式”字段留空,则文档文本版本中的所有行都将用于应用“列级别正则表达式”来标识单元格值。
  • 如果您定义了一个正则表达式来捕获表格区域,但将“行正则表达式”留空,则使用每个“列正则表达式”单独处理表格中的所有行,以捕获单元格值。
  • 如果您将“表格正则表达式”留空,但定义了“行正则表达式”,则使用“行正则表达式”捕获的所有文本,并应用“列正则表达式”捕获每一行的单元格值。
  • 如果您同时填写“表格”和“行正则表达式”,则该活动将应用“表格正则表达式”来标识表格字符串,然后应用“行正则表达式”来标识每一行,再应用“列级别正则表达式”来捕获单元格值。
  1. Add your regular expression in the Expression field. You have the option of either writing the whole RegEx in the Expression field or to build it by using the Edit option.

    重要提示:

    For any of the regular expressions you define, make sure you have at least one capture group. Only the captured parts of an expression are used for value reporting.

  2. Select the dropdown list from the Regex Options column. You can set various regex options from this multi-select option.

    您可以从以下选项中进行选择:

    • “CultureInvariant” - 指定忽略语言文化差异。

    • ECMAScript - Enables ECMA (European Computer Manufacturers Association) Script compliant behavior for the expression. This value can be used only in conjunction with the IgnoreCase and Multiline options.

    • ExplicitCapture - Specifies that the only valid captures are the ones of groups that are explicitly named or numbered and are defined as (?<name> subexpression). Any unnamed parentheses are ignored.

    • “忽略大小写” - 指定搜索不区分大小写。

    • IgnorePatternWhitespace - Eliminates the unescaped white space from the defined pattern and enables the comments marked with # (hashtag symbol). This option does not apply to character classes, numeric quantifiers, or tokens marking the beginning of an individual RegEx language element.

    • “单行” - 指定在单行中启动搜索。点 (.) 匹配所有字符,包括异常 \n

    • “多行” - 指定在多行中启动搜索。对于此选项,特殊字符 ^$ 可匹配任何行的开头和结尾。

    • “从右到左” - 指定从右到左执行搜索。

      备注:

      Visit RegexOptions Enum for more information about the regular expression options you can use.

Figure 4. The expanded Regex Options dropdown showing the available options

展开的“正则表达式选项”下拉列表,显示可用选项

正则表达式编辑器向导

  1. Select Edit to edit the options of that field and the format of the regular expression. The RegEx Builder wizard opens.

    Figure 5. Overview of the RegEx Builder wizard

    “正则表达式构建器”向导概览

  2. Input your desired text in the Test Text field. This is the text that you want to apply RegEx to based on the search criteria you choose. After that, insert a value in the Value field of the RegEx, which will then become highlighted in the Test Text field as well.

    Figure 6. Entering text in the Test Text field and highlighting a certain value from it using the Value field

    在“测试文本”字段中输入文本,并使用“值”字段高亮显示其中的某个值

  3. 从下拉列表中选择一种正则表达式类型。这将设置正则表达式以匹配以下特征之一:

    • “文字” - 匹配您指定的确切字符。此选项区分大小写。
    • “数字” - 匹配数字。
    • “其中之一” - 匹配集合中存在的单个字符。
    • “非其中之一” - 匹配集合中不存在的单个字符。
    • “任何内容” - 匹配除 \n 以外的任何字符。
    • “任何单词字符” - 匹配任何字母和数字。
    • “空格” - 匹配一个空格。
    • “开头为” - 从行开始的位置开始搜索。
    • “结尾为” - 在行尾处开始搜索。
    • “高级” - 需要自定义表达式。
    • “电子邮件” - 匹配电子邮件地址。
    • “URL” - 匹配 URL。
    • “美国日期” - 匹配美国日期格式。
    • “美国电话号码” - 匹配美国电话号码格式。

    Figure 7. The dropdown list showing the available characteristics for the regular expression

    显示正则表达式可用特征的下拉列表

    备注:

    Visit .NET regular expressions for more information about regular expressions in .NET.

  4. 使用“值”字段以写入正则表达式的值。

  5. Select a quantifier from the Quantifiers dropdown list. You can choose from the following options:

    • “精确” - 精确匹配前面指定的元素次数。默认情况下,它设置为 1
    • “任何(0 次或更多)” - 匹配前面的元素零次或多次,但次数越少越好。
    • “至少一次(1 次或更多)” - 匹配前面的元素一次或多次。
    • “零次或一次” - 匹配前面的元素零次或一次,但次数越少越好。
    • “在 x 次和 y 次之间” - 匹配前面的元素的次数为 x 次和 y 次之间,其中 xy 是整数,但次数越少越好。
  6. 要编辑字段,可以使用以下选项:

    1. Select Add 添加 to add an extra RegEx field.
    2. Select Move up上移 and Move down下移 to move fields up and down in the hierarchy.
    3. Select Removeremove to delete the field.
  7. 如果要提取该特定字段,请选中“捕获”选项的复选框。

  8. The Full Expression field shows the entire expression exactly how you customized it.

  9. Select one or multiple options from the Regex Options dropdown list.

    Figure 8. The available options in the Regex Options dropdown list

    “正则表达式选项”下拉列表中的可用选项

  10. Select Save once all your configurations are done to exit the Edit mode.

  11. Select Saveagain to close the wizard.

Document Understanding 集成

The RegEx Based Extractor activity is part of the Document Understanding solutions.

此页面有帮助吗?

连接

需要帮助? 支持

想要了解详细内容? UiPath Academy

有问题? UiPath 论坛

保持更新