Communications Mining 用户指南

上次更新日期 2025年11月10日

Elasticsearch 集成

Communications Mining™ 提供了一组丰富的内置分析工具。但是，有时需要将 Communications Mining 的预测与无法作为 Communications Mining 注释的一部分上传的数据加入一起。在这些情况下，常见的解决方案是将 Communications Mining 预测和任何其他数据编入 Elasticsearch 的索引，并使用 Kibana 等工具来驱动分析。本教程介绍如何将 Communications Mining 数据导入 Elasticsearch 并在 Kibana 中将其可视化。

本教程的示例中使用的数据是从保险域生成的虚拟电子邮件。

在 Elasticsearch 中存储数据

首先，我们来定义要导入 Elasticsearch 的数据。Communications Mining API 在嵌套 JSON 对象中提供注释文本、注释元数据、预测标签和预测通用字段。以下是 Communications Mining API 提供的原始注释示例。

注意：您可能会注意到不同的元数据字段，具体取决于将数据提取到 Communications Mining 的方式。要了解有关注释对象字段的更多信息，请查看注释。

{
  "comment": {
    "id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
    "uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
    "timestamp": "2021-03-29T08:36:25.607Z",
    "messages": [
      {
        "body": {
          "text": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL"
        },
        "subject": {
          "text": "Change of address - Policy SFG48807871"
        },
        "from": "[email protected]",
        "to": ["[email protected]"],
        "sent_at": "2021-03-29T08:36:25.607Z"
      }
    ]
    // (... more properties ...)
  },
  "labels": [
    {
      "name": ["Admin"],
      "probability": 0.9995054006576538
    },
    {
      "name": ["Admin", "Change of address"],
      "probability": 0.9995054006576538
    }
  ],
  "entities": [
    {
      "name": "address-line-1",
      "formatted_value": "19 Essex Gardens",
      "span": {
        "content_part": "body",
        "message_index": 0,
        "char_start": 63,
        "char_end": 79,
        "utf16_byte_start": 126,
        "utf16_byte_end": 158
      }
    },
    {
      "name": "post-code",
      "formatted_value": "SW17 2UL",
      "span": {
        "content_part": "body",
        "message_index": 0,
        "char_start": 81,
        "char_end": 89,
        "utf16_byte_start": 162,
        "utf16_byte_end": 178
      }
    },
    {
      "name": "policy-number",
      "formatted_value": "SFG48807871",
      "span": {
        "content_part": "subject",
        "message_index": 0,
        "char_start": 27,
        "char_end": 38,
        "utf16_byte_start": 54,
        "utf16_byte_end": 76
      }
    }
  ]
}{
  "comment": {
    "id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
    "uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
    "timestamp": "2021-03-29T08:36:25.607Z",
    "messages": [
      {
        "body": {
          "text": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL"
        },
        "subject": {
          "text": "Change of address - Policy SFG48807871"
        },
        "from": "[email protected]",
        "to": ["[email protected]"],
        "sent_at": "2021-03-29T08:36:25.607Z"
      }
    ]
    // (... more properties ...)
  },
  "labels": [
    {
      "name": ["Admin"],
      "probability": 0.9995054006576538
    },
    {
      "name": ["Admin", "Change of address"],
      "probability": 0.9995054006576538
    }
  ],
  "entities": [
    {
      "name": "address-line-1",
      "formatted_value": "19 Essex Gardens",
      "span": {
        "content_part": "body",
        "message_index": 0,
        "char_start": 63,
        "char_end": 79,
        "utf16_byte_start": 126,
        "utf16_byte_end": 158
      }
    },
    {
      "name": "post-code",
      "formatted_value": "SW17 2UL",
      "span": {
        "content_part": "body",
        "message_index": 0,
        "char_start": 81,
        "char_end": 89,
        "utf16_byte_start": 162,
        "utf16_byte_end": 178
      }
    },
    {
      "name": "policy-number",
      "formatted_value": "SFG48807871",
      "span": {
        "content_part": "subject",
        "message_index": 0,
        "char_start": 27,
        "char_end": 38,
        "utf16_byte_start": 54,
        "utf16_byte_end": 76
      }
    }
  ]
}

Communications Mining API 返回的原始注释架构不便于在 Elasticsearch 中筛选和查询此数据，因此您应在将数据提取到 Elasticsearch 之前更改架构。以下是您可以使用的展平架构示例。您应该添加用例所需的所有字段。

{
  "id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
  "uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
  "timestamp": "2021-03-29T08:36:25.607Z",
  "subject": "Change of address - Policy SFG48807871",
  "body": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL",
  // (... more fields ...)
  "labels": ["Admin", "Admin > Change of address"],
  "entities": {
    "policy_number": ["SFG48807871"],
    "address-line-1": ["19 Essex Gardens"],
    "post-code": ["SW17 2UL"]
  }
}{
  "id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
  "uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
  "timestamp": "2021-03-29T08:36:25.607Z",
  "subject": "Change of address - Policy SFG48807871",
  "body": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL",
  // (... more fields ...)
  "labels": ["Admin", "Admin > Change of address"],
  "entities": {
    "policy_number": ["SFG48807871"],
    "address-line-1": ["19 Essex Gardens"],
    "post-code": ["SW17 2UL"]
  }
}

请注意，注释可以有零个、一个或多个标签，因此labels字段必须是数组。此外，如果为数据集配置了一个或多个常规字段类型，则每种常规字段类型的注释将包含零个、一个或多个常规字段。原始 API 响应中的层次结构标签名称本身就是数组 ( ["Admin", "Change of address"] )，应转换为字符串 ( "Admin > Change of address" )。

正在获取数据

要获取数据，我们建议使用。有关所有可用的数据下载方法的概述，请查看下载数据。创建流时，您应该为每个标签设置阈值，以便丢弃置信度分数低于阈值的标签。最简单的方法是从 Communications Mining™ 用户界面转到数据集的“流”页面。使用置信度分数确定标签是否适用后，您可以仅将标签名称导入 Elasticsearch。有关我们建议删除或保留标签置信度分数的信息，请查看分析标签。

通用字段没有置信度分数，因此不需要特殊处理。

备注：

模型变更管理

创建流时，请指定模型版本。此模型版本用于在从流中获取注释时提供预测。即使用户继续在平台中训练新的模型版本，您的流也将使用您指定的模型版本，为您提供确定性结果。

要升级到新的模型版本，您必须创建使用该模型版本的新流，然后更新代码以使用新的流。（因此，我们建议您在代码中配置可配置的流名称。）为确保使用预测的分析保持一致，您应使用更新后的模型版本重新提取历史数据的预测。您可以将流分配到最早注释之前的时间戳，并从头开始重新提取数据。

在 Kibana 中可视化数据

在 Elasticsearch 中为数据建立索引后，您就可以开始构建可视化。本节提供了 Kibana 中许多常见可视化工具的简单示例。

泰美利安

您可以使用以下表达式生成前 5 个最常用标签随时间变化的图表。请注意，这会同时显示顶级类别和子类别标签。

.es(index=example-data,split=labels:5,timefield=@timestamp)
    .label("$1", "^.* > labels:(.+) > .*").es(index=example-data,split=labels:5,timefield=@timestamp)
    .label("$1", "^.* > labels:(.+) > .*")