- Introduction
- Setting up your account
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields
- Labels (predictions, confidence levels, label hierarchy, and label sentiment)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Annotated and unannotated messages
- Extraction Fields
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Access Control and Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Creating or deleting a data source in the GUI
- Uploading a CSV file into a source
- Preparing data for .CSV upload
- Creating a dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amending dataset settings
- Deleting a message
- Deleting a dataset
- Exporting a dataset
- Using Exchange integrations
- Model training and maintenance
- Understanding labels, general fields, and metadata
- Label hierarchy and best practices
- Comparing analytics and automation use cases
- Turning your objectives into labels
- Overview of the model training process
- Generative Annotation
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Training chat and calls data
- Understanding data requirements
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and Recall
- How validation works
- Understanding and improving model performance
- Reasons for label low average precision
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining™
- Developer
- Exchange Integration with Azure service user
- Exchange Integration with Azure Application Authentication
- Exchange Integration with Azure Application Authentication and Graph
- Fetching data for Tableau with Python
- Elasticsearch integration
- Self-hosted Exchange integration
- UiPath® Automation Framework
- UiPath® Marketplace activities
- UiPath® official activities
- How machines learn to understand words: a guide to embeddings in NLP
- Prompt-based learning with Transformers
- Efficient Transformers II: knowledge distillation & fine-tuning
- Efficient Transformers I: attention mechanisms
- Deep hierarchical unsupervised intent modelling: getting value without training data
- Fixing annotating bias with Communications Mining™
- Active learning: better ML models in less time
- It's all in the numbers - assessing model performance with metrics
- Why model validation is important
- Comparing Communications Mining™ and Google AutoML for conversational data intelligence
- Licensing
- FAQs and more

Communications Mining user guide
Uploading a CSV file into a source
-
Updating anything other than user properties will cause general field annotations in associated datasets to be lost. For example, updating existing messages in a source, changing message properties, such as message text, sent_at timestamp, and to or from. Make sure you pin the latest model version in the associated datasets before doing so.
-
For details on creating a data source, check Creating or deleting a data source in the GUI.
- Navigate to the Administrator page.
- Select the Sources tab, and locate the source to which you want to upload data.
- Select the upload icon on the data source card.
- Use Select file to choose a CSV file from your computer.
- Select the CSV file you want to upload. Make sure the file meets the following criteria:
- The file should include headers on the first line and be delimited by commas or tabs.
- The file must contain a minimum of three columns:
- Message: the message text.
- Timestamp: when the message was created.
- Unique ID: a distinct identifier for each message.
- All text fields should be enclosed in double quotes in the file.
- The file must be encoded as UTF-8, UTF-16, or UTF-32. The platform automatically detects the correct encoding.
- The file should be 128 MiB or smaller. For larger files, split them into multiple files, each less than 128 MiB.
- Select the required columns, where the dropdown menus contain the column headers detected in the CSV file:
- Message Id Column - A column with a unique
ID that can identify the message. The message IDs can only contain ASCII
alphanumeric characters (A-Z, a-z, 0-9) and punctuation, except for
forward slash
/
.Note: If there are existing messages in the source with the same ID, they will be updated to match the contents of the new file. - Message Column - The column that contains the message text that you want to analyze in the platform.
- Timestamp Column - The column that contains the date and time when the message was recorded. The timestamp format is flexible, and the platform infers it automatically. For more details, check Using the correct formats.
- Message Id Column - A column with a unique
ID that can identify the message. The message IDs can only contain ASCII
alphanumeric characters (A-Z, a-z, 0-9) and punctuation, except for
forward slash
- You can select the following additional columns, if you have data that contains subject lines, threads, or participants, usually
encountered in cases or email threads:
- Subject Column - The column that contains the subject of the message.
- Sender Column - The column that contains the sender.
- To Column - The column that contains one or more recipients. Make sure that multiple recipients are separated by a semicolon
;
. - Cc Column - The column that contains one or more recipients in the Cc field. Make sure that multiple recipients are separated by a
semicolon
;
.- For more details on using the correct formats in the Sender, To, and Cc fields, check Using the correct formats.
- Thread ID Column - The column that contains the message thread ID. The thread ID ties together different messages to the same thread.
- You can select the additional user properties
that you want to upload with the messages. User properties are contextual
metadata associated with each message that you can filter in the platform. The
machine learning models in the platform may also leverage these user properties,
which are of the following types:
- String User Properties are categorical metadata, for example, IDs, countries, counterparties, and so on.
- Number User Properties are numeric metadata, for example, NPS, email statistics, amounts, and so on.
Note: If your file contains an NPS score as a user property, you must include this as a number property, and only name itNPS
to trigger native NPS charts to load in the platform. - Once you have selected all user properties,
select Upload.
You will be prompted to inspect the uploaded messages in a dataset that contains the source you uploaded data into. If the source is not associated with any datasets, you can create a new dataset to check that the upload is as expected.
Note: If you made a mistake when selecting the user properties, you can upload the same file again. The platform will use the column ID as the identifier to overwrite the existing messages and properties. This will not affect any labels applied to existing messages.
The Sender/To/CC format
Make sure that:
- The number of recipients does not exceed the maximum of 2,048 per thread.
- The sender or any recipient does not exceed the 512-character limit.
- There is only one semicolon in a row. For example, the following is incorrectly formatted: [email protected] ; [email protected].
- Example 1: Robert Bog <[email protected]>; John Smith <[email protected]>
- Example 2: [email protected] ;[email protected]
- Example 3: [email protected] ; [email protected]
;
.
Before uploading your data, make sure the emails are formatted appropriately.
The Timestamp format
01/02/03 10:10
, you can suggest the correct interpretation:
- 2nd of January 2003 - None
- 1st of February 2003 - Day first
- 3rd of February 2001 - Year first
- 2nd of March 2001 - Day first + Year first
RFC 3339
format. For example, 2020-01-31T12:34:56Z
for UTC or with a timezone: 2020-08-031T11:20:60-08:00
.
This section contains the possible error messages that may occur during the upload process, and the solutions to each of them.
In the following error messages, {something} maps to contextual information about where the error occurred. Additionally, the way we refer to a position in the file is standardized as:
String | Expands to: |
---|---|
{position} | record {row-number} on line {line-number} column {column-number} (byte {byte-number}) |
The title of the error message is displayed along with a description, as shown in the following image:
Error Kind | Error Message | Description |
---|---|---|
Not Enough Columns | The CSV file only contains {number-columns} columns, but at least 3 are needed, that is, text, timestamp and id. | The uploaded CSV does not contain at least 3 columns, or the platform has mis-detected the encoding of the file. |
Invalid Encoding | The file contains invalid characters, where the encoding is detected as {detected-encoding}. | The file is not correctly encoded as UTF-8, UTF-16, or UTF-32. The platform automatically detects the format of the file. |
Invalid Header | string:ti:er' does not match '(^delimiter|id|message|timestamp |timestamp_default_utc_offset |timestamp_day_first|timestamp_year_first\\Z)|(^(?P<property_type>number|string):(?P<name>\\w(?:[\\w]{0,30}\\w)?)\\Z)' | If a column header is an invalid name for a user property, the platform returns the default message for when the schema of a request is invalid. Check that each column header is a valid format for its purpose. The maximum length for a column header is 32 alphanumeric characters. |
Unequal Row Lengths | The CSV contains unequal row lengths. Message {position} has {number} fields, but the previous record has {number} fields. | The CSV contains rows with different numbers of cells in them or that are inconsistent with the number of headers. |
Id format | Invalid message id for {record}. IDs can only consist of ASCII alphanumeric characters and punctuation, except for forward
slash / . Cell value: {cell-value}.
| Occurs when an ID field consists of invalid characters as described in the error message. |
Id length | The ID is too long for message {record}. It has {number} bytes, expected at most 1024. | Occurs when an ID field is longer than the maximum allowed length, 1024 characters. |
Timestamp Format | Incorrectly formatted timestamp in message {position}: {timestamp-error-message}. Cell value: {cell-value}. | Occurs when a timestamp field could not be parsed. |
Message Length | Message is too long for message {position}. It has {number} bytes, expected at most 65536. | Occurs when a message field is longer than the maximum allowed length, 65536 characters. |
Number Property Format | Incorrectly formatted number in message {position}: {number-error-message}. Cell value: {cell-value}. | Occurs when a number user property field could not be parsed. The platform should allow any format that can reasonably be decoded as a number. |
Property Length | Property is too long for message {position}. It has {number} bytes, expected at most 4096. | Occurs when a user property field is longer than the maximum allowed length, 4096 characters. |
Unknown Error | Unknown CSV error: {underlying-error-message}. | If an unknown error occurs, retry the upload. |