
Communications Mining 用户指南
Communications Mining™ 提供了一组丰富的内置分析工具。但是,有时需要将 Communications Mining 的预测与无法作为 Communications Mining 注释的一部分上传的数据加入一起。在这些情况下,常见的解决方案是将 Communications Mining 预测和任何其他数据编入 Elasticsearch 的索引,并使用 Kibana 等工具来驱动分析。本教程介绍如何将 Communications Mining 数据导入 Elasticsearch 并在 Kibana 中将其可视化。
本教程的示例中使用的数据是从保险域生成的虚拟电子邮件。
{
"comment": {
"id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
"uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
"timestamp": "2021-03-29T08:36:25.607Z",
"messages": [
{
"body": {
"text": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL"
},
"subject": {
"text": "Change of address - Policy SFG48807871"
},
"from": "[email protected]",
"to": ["[email protected]"],
"sent_at": "2021-03-29T08:36:25.607Z"
}
]
// (... more properties ...)
},
"labels": [
{
"name": ["Admin"],
"probability": 0.9995054006576538
},
{
"name": ["Admin", "Change of address"],
"probability": 0.9995054006576538
}
],
"entities": [
{
"name": "address-line-1",
"formatted_value": "19 Essex Gardens",
"span": {
"content_part": "body",
"message_index": 0,
"char_start": 63,
"char_end": 79,
"utf16_byte_start": 126,
"utf16_byte_end": 158
}
},
{
"name": "post-code",
"formatted_value": "SW17 2UL",
"span": {
"content_part": "body",
"message_index": 0,
"char_start": 81,
"char_end": 89,
"utf16_byte_start": 162,
"utf16_byte_end": 178
}
},
{
"name": "policy-number",
"formatted_value": "SFG48807871",
"span": {
"content_part": "subject",
"message_index": 0,
"char_start": 27,
"char_end": 38,
"utf16_byte_start": 54,
"utf16_byte_end": 76
}
}
]
}{
"comment": {
"id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
"uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
"timestamp": "2021-03-29T08:36:25.607Z",
"messages": [
{
"body": {
"text": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL"
},
"subject": {
"text": "Change of address - Policy SFG48807871"
},
"from": "[email protected]",
"to": ["[email protected]"],
"sent_at": "2021-03-29T08:36:25.607Z"
}
]
// (... more properties ...)
},
"labels": [
{
"name": ["Admin"],
"probability": 0.9995054006576538
},
{
"name": ["Admin", "Change of address"],
"probability": 0.9995054006576538
}
],
"entities": [
{
"name": "address-line-1",
"formatted_value": "19 Essex Gardens",
"span": {
"content_part": "body",
"message_index": 0,
"char_start": 63,
"char_end": 79,
"utf16_byte_start": 126,
"utf16_byte_end": 158
}
},
{
"name": "post-code",
"formatted_value": "SW17 2UL",
"span": {
"content_part": "body",
"message_index": 0,
"char_start": 81,
"char_end": 89,
"utf16_byte_start": 162,
"utf16_byte_end": 178
}
},
{
"name": "policy-number",
"formatted_value": "SFG48807871",
"span": {
"content_part": "subject",
"message_index": 0,
"char_start": 27,
"char_end": 38,
"utf16_byte_start": 54,
"utf16_byte_end": 76
}
}
]
}{
"id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
"uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
"timestamp": "2021-03-29T08:36:25.607Z",
"subject": "Change of address - Policy SFG48807871",
"body": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL",
// (... more fields ...)
"labels": ["Admin", "Admin > Change of address"],
"entities": {
"policy_number": ["SFG48807871"],
"address-line-1": ["19 Essex Gardens"],
"post-code": ["SW17 2UL"]
}
}{
"id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
"uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
"timestamp": "2021-03-29T08:36:25.607Z",
"subject": "Change of address - Policy SFG48807871",
"body": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL",
// (... more fields ...)
"labels": ["Admin", "Admin > Change of address"],
"entities": {
"policy_number": ["SFG48807871"],
"address-line-1": ["19 Essex Gardens"],
"post-code": ["SW17 2UL"]
}
}labels字段必须是数组。 此外,如果为数据集配置了一个或多个常规字段类型,则每种常规字段类型的注释将包含零个、一个或多个常规字段。 原始 API 响应中的层次结构标签名称本身就是数组 ( ["Admin", "Change of address"] ),应转换为字符串 ( "Admin > Change of address" )。
要获取数据,我们建议使用。有关所有可用的数据下载方法的概述,请查看下载数据。创建流时,您应该为每个标签设置阈值,以便丢弃置信度分数低于阈值的标签。最简单的方法是从 Communications Mining™ 用户界面转到数据集的“流”页面。使用置信度分数确定标签是否适用后,您可以仅将标签名称导入 Elasticsearch。有关我们建议删除或保留标签置信度分数的信息,请查看分析标签。
通用字段没有置信度分数,因此不需要特殊处理。
模型变更管理
创建流时,请指定模型版本。 此模型版本用于在从流中获取注释时提供预测。 即使用户继续在平台中训练新的模型版本,您的流也将使用您指定的模型版本,为您提供确定性结果。
要升级到新的模型版本,您必须创建使用该模型版本的新流,然后更新代码以使用新的流。(因此,我们建议您在代码中配置可配置的流名称。)为确保使用预测的分析保持一致,您应使用更新后的模型版本重新提取历史数据的预测。您可以将流分配到最早注释之前的时间戳,并从头开始重新提取数据。
在 Elasticsearch 中为数据建立索引后,您就可以开始构建可视化。 本节提供了 Kibana 中许多常见可视化工具的简单示例。
泰美利安
您可以使用以下表达式生成前 5 个最常用标签随时间变化的图表。 请注意,这会同时显示顶级类别和子类别标签。
.es(index=example-data,split=labels:5,timefield=@timestamp)
.label("$1", "^.* > labels:(.+) > .*").es(index=example-data,split=labels:5,timefield=@timestamp)
.label("$1", "^.* > labels:(.+) > .*")条形图
Pie Chart