ixp
latest
false
  • 概述
    • 简介
    • 从非结构化文档中提取数据
    • 构建和部署模型
    • 配额
  • 模型构建
    • 概述
    • 管理项目
    • 上传示例文档
    • 查看预测
    • 验证提取预测
    • Model configuration
  • 模型验证
  • 模型部署
  • 常见问题
UiPath logo, featuring letters U and I in white

非结构化文档和复杂文档用户指南

上次更新日期 2025年9月30日

Model configuration

概述

You can configure the underlying LLM as well as its settings in the Model configuration option from the Build tab.



可用设置包括:
  • Intelligent pre-processing:
    • Table model - mini
    • Table model
  • Extraction model:
    • GPT-4o
    • Gemini
  • Advanced options:
    • Attribution
    • Temperature
    • Top P
    • Seed
    • Frequency penalty
    • Prompt override

Adjust these settings to improve the accuracy of the model's predictions and enhance its performance.

Intelligent pre-processing

Intelligent pre-processing options improve prediction performance when documents are difficult for models to interpret, due to complex formatting.

This includes the following options:
  • - 此默认选项适用于大多数没有表格内容的文档。
  • Table model - mini - Optimized for tabular content and latency. This option is best suited for documents with simple tables or multiple tables.
  • Table model - Optimized for more complex tabular content. This option is best suited for documents with complex nested tables, tables with merged cells, bullet points, or tables spanning across multiple pages.
    备注:
    • While this performs best on complex tables, it increases the latency of predictions.
    • This feature relies on Gemini models through the AI Trust Layer.

智能预处理示例

下图显示了不使用“表格模型”模式查询 LLM 的提取示例,其中 this period 列的值会与 year to date 列的值混淆。

下图显示了使用“表格模型”模式的提取示例,其中 this periodyear to date 列的值都已正确提取。

提取模型

The Extraction model option represents the underlying LLM used for extraction.

The available models are:
  • GPT-4o
  • Gemini

Choosing the most suitable model

Different models will perform differently for different use cases, but you are recommended to use Gemini where possible. Several other pre- and post-processing features, which help optimize performance and user experience, are also Gemini-based.

GPT-4o has a restriction of 50 pages and can only process more using the currently previewed iterative calling feature. Gemini does not have the same restriction and can process documents in IXP up to 200 pages in a single call. The Gemini limit may vary slightly based on the density of field values within the document.

In addition, the Gemini model has an input limit of 200 pages by default, compared to the 50 pages input limit of GPT-4o.

Switching from one model to another

To switch from one model to another, use the dropdown list of the Extraction model option and select Save. This will trigger a new project version to be created and new predictions to be generated automatically.

Important: For mature projects, taxonomies, particularly instructions, and confirmed predictions, particularly for inferred fields, are typically optimized for one model type over the other. It is likely that after switching, performance scores can drop, as some iteration on instructions and re-reviewing predictions may be required to undo model-specific optimizations that may be impacting the performance of the other model.

If you need to switch the model for performance reasons, check first whether the alternative model can solve the core problem that the current model cannot solve. If it can, optimize the new model to improve the performance metrics in Measure.

高级选项

Advanced options allow you to customize the settings for your models, select which attribution method to use, and use the prompt override.

Note: Using prompt override is only recommended in exceptional cases.

Expand the setting to view all available options:

  • Attribution - The method used for attributing predictions to the relevant part or text in the document. Select one of the following options:
    • Rules-based - Uses an extensive set of rules and heuristics to match the correct spans on a page to the predicted values from the model. This is a low-latency option, but it sacrifices performance in terms of successful attributions compared to the model-based option.
    • Model-based - Uses an additional LLM call to successfully match the predicted values to the correct spans on the page, as these values can often be repeated in different parts of the page. This is the most performant option in terms of successful attributions, but it does add some latency to predictions. This option relies on using Gemini models.
  • 温度 - 要使用的采样温度。选择一个介于 0.0 和 2.0 之间的数字。值越高,输出随机性越高。
  • Top P - Samples only from tokens with the top_p probability mass​. Select a number between 0.0 and 1.0.
  • 种子- 如果指定,重复使用相同种子和参数的请求应返回相同结果。
  • 频率惩罚 - 选择一个介于 -2.0 和 2.0 之间的数字。正值将降低模型重复已出现在文本中的令牌的概率。
  • Prompt override - Overrides the default system prompt with a new value. This option is disabled by default. Once enabled, the Append task instructions prompt and the Append field instructions prompt options are enabled for configuration.
Note: The UiPath® team has researched and optimized the defaults for model settings such as Temperature, Top P, and Frequency. As a result, you do not need to adjust these values unless you know what specific settings you need.

  • 概述
  • Intelligent pre-processing
  • 智能预处理示例
  • 提取模型
  • 高级选项

此页面有帮助吗?

获取您需要的帮助
了解 RPA - 自动化课程
UiPath Community 论坛
Uipath Logo
信任与安全
© 2005-2025 UiPath。保留所有权利。