UiPath Documentation
document-understanding
latest
false
重要 :
新发布内容的本地化可能需要 1-2 周的时间才能完成。
UiPath logo, featuring letters U and I in white

Document Understanding 发行说明

上次更新日期 2026年4月23日

常规 ML 包和公共端点更新

UiPath Document OCR 公共端点发布版本

发布日期:2026 年 1 月 20 日

改进

  • 在 Document Understanding 中的 OCR 期间,改进了对显示为徽标或样式化文本的公司名称的处理。
  • 增强了对包含阿拉伯数字的阿拉伯语 ID 编号的 OCR 处理。此更新改进了受支持的阿拉伯语 ID 格式的数字识别,有助于减少在提取期间字符丢失或错误识别的情况。

UiPath Document OCR 公共端点发布版本

发布日期:2025 年 12 月 4 日

改进

新增了对夏威夷语变音符号的支持,以提高包含夏威夷语文本的文档的识别准确性。

UiPath Helix Extractor public endpoints improvements

发布日期:2025 年 11 月 7 日

新增功能

  • Public endpoints for extraction models in the United States, with the exception of Financial Statements, are now based on the Helix Extractor.
  • Public endpoints for the United States now include Receipts2, Invoices2, Receipts Japan, and US Mortgage Closing Disclosures. For more information, go to the Public endpoints documentation and expand the table for a full, scrollable list. We recommend you make sure your activity is connected to the correct endpoint according to each server region.

Erratum - added January 16, 2025: As part of our ongoing product evolution and portfolio alignment, we have updated the product name to UiPath Helix Extractor. All references in this document reflect this change.

UiPath Helix Extractor public endpoints improvements

新增功能

  • Public endpoints for extraction models in Japan, with the exception of Financial Statements, are now based on the Helix Extractor.
  • Public endpoints for Japan now include Receipts2, Invoices2, and US Mortgage Closing Disclosures. For more information, go to the Public endpoints documentation and expand the table for a full, scrollable list. We recommend you make sure your activity is connected to the correct endpoint according to each server region.

Erratum - added January 16, 2025: As part of our ongoing product evolution and portfolio alignment, we have updated the product name to UiPath Helix Extractor. All references in this document reflect this change.

UiPath Helix Extractor public endpoints improvements

发布日期:2025 年 8 月 14 日

新增功能

We are excited to announce the release of improved accuracy for public endpoints based on the UiPath Helix Extractor in Europe region.

With this release, the following models are now based on the UiPath Helix Extractor in Europe region as well:

  • 9465
  • 中国发票
  • 希伯来语发票
  • 日本发票
  • 日本收据

With this addition, all models are now based on the UiPath Helix Extractor in Europe region, except for Financial Statements.

新的文档类型

此版本引入了以下新的文档类型:

  • Invoices2: this document type is trained to extract key data points from a wide variety of invoice types, including standard invoices, credit notes, Indian invoices, and shipping invoices. The enhanced schema consists of 55 regular fields, two structured tables—one for line items (13 columns) and one for bank payment details (9 columns)—and a currency classifier.
  • Receipts2: this document type is trained to extract key data points from a wide range of receipt types, including but not limited to parking receipts, train tickets, hotel bills, airport purchases, meals, pharmacy receipts, electronic receipts, and more. The enhanced schema consists of 60 regular fields and a structured table of items with 6 columns.
  • US Mortgage Closing Disclosures: this document type is trained to extract key data points from standard US closing disclosure forms. The schema includes 20 regular fields capturing borrower details, property information, loan terms, transaction summaries, and cash to close figures.

Erratum - added January 16, 2025: As part of our ongoing product evolution and portfolio alignment, we have updated the product name to UiPath Helix Extractor. All references in this document reflect this change.

增强的字符识别功能

发布说明:2025 年 6 月 23 日

此更新对我们的光学字符识别 (OCR) 引擎进行了一系列改进,旨在提高更广泛输入样式的准确性和可靠性:

  • 增强了手写检测功能。
  • 增强等宽字体中“O”和“0”的区分度。
  • 改进了对点矩阵打印文本的识别。

这些改进提高了各种文档类型的可靠性。

通过 API 和活动可用的日本收据文档类型

发布说明:2025 年 5 月 19 日

You can now use the Receipts Japan document type through APIs and IntelligentOCR and Document Understanding activities as well. This is available for all tenants based in the Japan region.

移除版本低于 2023.4 的开箱即用模型

发布日期:2025 年 4 月 11 日

移除版本低于 2023.4 的开箱即用模型

随着新型替代方案和升级解决方案的发布,我们移除了对开箱即用模型的支持,包括 UiPath Document Understanding OCR 与 OCR_CPU、数据提取 ML 包以及文档分类器。

下表详细说明了移除计划。

特性或功能移除公布日期移除日期:注意
早于 2023.4 的开箱即用 ML 包2025 年 4 月2025 年 4 月我们建议使用最新版本的开箱即用 ML 包。

For more information on deprecations, check the Deprecation timeline in the Overview guide.

UiPath Document OCR 公共端点发布版本

发布日期:2025 年 3 月 19 日

改进

  • 整体复选框、手写和打印文本检测得到改进。
  • 手写识别准确性得到了提升。
  • 针对日本印章的印章检测功能得到改进。
  • 边界框的大小和定位准确性得到了改进。

UiPath Helix Extractor public endpoints release

发布日期:2024 年 11 月 28 日

新的文档类型

此版本引入了一种新的文档类型:日本收据 (Receipts Japan)。这个新的公共端点可以从各种文档类型中提取关键详细信息,例如常规收银机收据、餐厅收据、旅馆收据、火车收据、停车收据以及其他类型的日语收据等。

Public endpoints for Invoices China and Invoices Japan based on UiPath Helix Extractor

We are excited to announce the release of improved endpoints for Invoices China and Invoices Japan. This new generation of endpoints, based on the UiPath Helix Extractor, the new UiPath LLM, brings enhanced accuracy and performance.

日本发票改进

We have made significant improvements to the Invoices Japan public endpoint, adding new fields, such as:

  • 常规字段:
    • 净减少额
    • 已扣税额
    • 未减少净额
    • 未扣税额
    • 预扣税额
    • 存款
  • 列字段:
    • 项目税率
    • 项目注册税
    • 项目费用

Erratum - added January 16, 2025: As part of our ongoing product evolution and portfolio alignment, we have updated the product name to UiPath Helix Extractor. All references in this document reflect this change.

发布日本发票公共端点

发布日期:2024 年 10 月 29 日

在“日本发票”端点中发布

改进

我们对文档数字化流程进行了重大改进。现在,当使用 UiPath 扩展语言 OCR 或中文、韩语、日语 OCR 时,输出将是常规字词框,而不是单个字符。

发布新的日本发票公共端点

发布日期:2024 年 10 月 15 日

在“日本发票”端点中发布

改进

  • 提高了日本发票 ML 包的准确性。
  • 当中文、日语或韩语字符在文档中与拉丁字符、标点符号和数字混合使用时,我们增强了间距和字词分析。
  • 我们修复了一个问题,该问题导致 AI Center 训练管道不正确地为 ID NumberPhone Number 字段类型报告高分。这可确保报告的分数与实际分数相符。

UiPath 扩展语言 OCR 正式发布

发布日期:2024 年 10 月 3 日

We are excited to announce that our latest OCR engine, UiPath Extended Languages OCR, is now in general availability. The new OCR is capable of digitizing documents in over 200 languages, bringing a significant improvement over its predecessor, especially in regards to Chinese, Japanese, and Korean. Additionally, it can process documents in Thai, Vietnamese, all major languages from India, as well as languages using the Cyrilic alphabet, and Greek.

The UiPath Extended Languages OCR is currently only available as a public endpoint.

New public endpoints based on UiPath® Helix Extractor

发布日期:2024 年 9 月 17 日

Improved performance and new model endpoints enrolled on UiPath Helix Extractor

This release brings enhanced accuracy and performance for models based on the UiPath Helix Extractor, the new UiPath LLM. Furthermore, the following models are now based on the UiPath Helix Extractor as well:

  • 709
  • 941x
  • 1040x
  • 3949
  • 3949a

模型端点重定向到旧一代

Due to performance issues, the Financial Statement model endpoint is redirected to the old generation.

预览模型已删除

The 990 (Preview) model is removed from both public endpoints and Data Extraction ML packages.

Erratum - added January 16, 2025: As part of our ongoing product evolution and portfolio alignment, we have updated the product name to UiPath Helix Extractor. All references in this document reflect this change.

弃用 UiPath 中文、日语、韩语 OCR

发行日期:2024 年 7 月 8 日

The UiPath Chinese, Japanese, Korean OCR will be deprecated starting with January 2025. We recommend using the UiPath Extended Languages OCR instead.

Check the Deprecation timeline page for more information about upcoming deprecations and removals.

Public endpoints for Invoices and Receipts based on UiPath® Helix Extractor

发布日期:2024 年 6 月 12 日

We are excited to announce the release of improved endpoints for Invoices and Receipts. This new generation, based on the UiPath Helix Extractor, the new UiPath LLM, brings enhanced accuracy and performance.

We are gradually replacing our models with a new generation. For now, all public endpoints are based on the Helix Extractor, except for the following endpoints:

  • 709
  • 941x
  • 1040x
  • 3949a
  • 9465
  • 中国发票
  • 希伯来语发票
  • 日本发票

查看发行说明,以了解未来的公告。

Erratum - added January 16, 2025: As part of our ongoing product evolution and portfolio alignment, we have updated the product name to UiPath Helix Extractor. All references in this document reflect this change.

Public endpoints based on Helix Extractor

发行日期:2024 年 5 月 29 日

We are excited to announce the release of improved endpoints for our pre-trained, out-of-the-box ML packages. This new generation, based on the UiPath Helix Extractor, the new UiPath® LLM, brings enhanced accuracy and performance.

We are gradually replacing our models with a new generation. For now, all public endpoints are based on the Helix Extractor, except for the following endpoints:

  • 709
  • 941x
  • 1040x
  • 3949a
  • 9465
  • 发票
  • 中国发票
  • 希伯来语发票
  • 日本发票
  • 收据

查看发行说明,以了解未来的公告。

Erratum - added January 16, 2025: As part of our ongoing product evolution and portfolio alignment, we have updated the product name to UiPath Helix Extractor. All references in this document reflect this change.

UiPath 扩展语言 OCR(公共预览版)

发行日期:2024 年 3 月 28 日

We are excited to announce that our latest OCR engine, UiPath Extended Languages OCR, is now in Public Preview. The new OCR is capable of digitizing documents in over 200 languages, bringing a significant improvement over its predecessor, especially in regards to Chinese, Japanese, and Korean. Additionally, it can process documents in Thai, Vietnamese, all major languages from India, as well as languages using the Cyrilic alphabet, and Greek.

The UiPath Extended Languages OCR is currently only available as a public endpoint.

冻结的主干训练

发行日期:2023 年 4 月 27 日

The ML packages versions v23.4 and higher, now have the option to train using Frozen Backbone. This new approach trains faster and gives better results for small or low diversity training sets below 400 pages. You can override this behavior by using the new Training Pipeline environment variables documented in the official documentation.

已弃用澳大利亚发票

发布日期:2022 年 11 月 29 日

An upcoming deprecation is announced for the Invoices Australia pre-trained ML package. We recommend using instead the Invoices ML package instead. Here you can find more details about it.

ML 分类端点公开预览

发行日期:2022 年 6 月 27 日

在端点中发布

The ML Classification endpoint is now available in public preview.

UiPath 中文、日语、韩语 OCR 版本

端点

发行日期:2022 年 6 月 20 日

在端点中发布

The UiPath Chinese, Japanese, Korean OCR public endpoint has become generally available.

数据提取 ML 包

发行日期:2022 年 6 月 6 日

在 AI Center Cloud 中发布,适用于数据提取 ML 包

A new OCR method, UiPath Chinese, Japanese, Korean OCR, is now available and can be applied to new or already created projects from Document Understanding, cloud only.

此页面有帮助吗?

连接

需要帮助? 支持

想要了解详细内容? UiPath Academy

有问题? UiPath 论坛

保持更新