Communications Mining user guide

Last updated Nov 10, 2025

Uploading a CSV file into a source

Note: You must have assigned the IXP Project Admin role as an Automation Cloud user, or the Sources admin and Edit messages permissions as a legacy user to upload CSV files into a source.

To upload data from a CSV file into a data source, apply the following steps:

Note:

Updating anything other than user properties will cause general field annotations in associated datasets to be lost. For example, updating existing messages in a source, changing message properties, such as message text, sent_at timestamp, and to or from. Make sure you pin the latest model version in the associated datasets before doing so.
For details on creating a data source, check Creating or deleting a data source in the GUI.

Navigate to the Administrator page.
Select the Sources tab, and locate the source to which you want to upload data.
Select the upload icon on the data source card.
Use Select file to choose a CSV file from your computer.
Select the CSV file you want to upload. Make sure the file meets the following criteria:
- The file should include headers on the first line and be delimited by commas or tabs.
- The file must contain a minimum of three columns:
  - Message: the message text.
  - Timestamp: when the message was created.
  - Unique ID: a distinct identifier for each message.
- All text fields should be enclosed in double quotes in the file.
- The file must be encoded as UTF-8, UTF-16, or UTF-32. The platform automatically detects the correct encoding.
- The file should be 128 MiB or smaller. For larger files, split them into multiple files, each less than 128 MiB.
Select the required columns, where the dropdown menus contain the column headers detected in the CSV file:
- Message Id Column - A column with a unique ID that can identify the message. The message IDs can only contain ASCII alphanumeric characters (A-Z, a-z, 0-9) and punctuation, except for forward slash /.
  
  Note: If there are existing messages in the source with the same ID, they will be updated to match the contents of the new file.
- Message Column - The column that contains the message text that you want to analyze in the platform.
- Timestamp Column - The column that contains the date and time when the message was recorded. The timestamp format is flexible, and the platform infers it automatically. For more details, check Using the correct formats.
You can select the following additional columns, if you have data that contains subject lines, threads, or participants, usually encountered in cases or email threads:
- Subject Column - The column that contains the subject of the message.
- Sender Column - The column that contains the sender.
- To Column - The column that contains one or more recipients. Make sure that multiple recipients are separated by a semicolon ;.
- Cc Column - The column that contains one or more recipients in the Cc field. Make sure that multiple recipients are separated by a semicolon ;.
  - For more details on using the correct formats in the Sender, To, and Cc fields, check Using the correct formats.
- Thread ID Column - The column that contains the message thread ID. The thread ID ties together different messages to the same thread.
You can select the additional user properties that you want to upload with the messages. User properties are contextual metadata associated with each message that you can filter in the platform. The machine learning models in the platform may also leverage these user properties, which are of the following types:
- String User Properties are categorical metadata, for example, IDs, countries, counterparties, and so on.
- Number User Properties are numeric metadata, for example, NPS, email statistics, amounts, and so on.
Note: If your file contains an NPS score as a user property, you must include this as a number property, and only name it NPS to trigger native NPS charts to load in the platform.
Once you have selected all user properties, select Upload.
You will be prompted to inspect the uploaded messages in a dataset that contains the source you uploaded data into. If the source is not associated with any datasets, you can create a new dataset to check that the upload is as expected.

Note: If you made a mistake when selecting the user properties, you can upload the same file again. The platform will use the column ID as the identifier to overwrite the existing messages and properties. This will not affect any labels applied to existing messages.

Using the correct formats

The Sender/To/CC format

Make sure that:

The number of recipients does not exceed the maximum of 2,048 per thread.
The sender or any recipient does not exceed the 512-character limit.
There is only one semicolon in a row. For example, the following is incorrectly formatted: [email protected] ; [email protected].

Although the platform removes any white spaces before or after a recipient, it does not perform any additional data cleaning. The following are some format examples that you may want to use for your data:

Example 1: Robert Bog <[email protected]>; John Smith <[email protected]>
Example 2: [email protected] ;[email protected]
Example 3: [email protected] ; [email protected]

The platform delimits the different recipients by semicolon ;.

Before uploading your data, make sure the emails are formatted appropriately.

Note: In a standard threaded use case, such as email threads, each sender cell should contain only one sender.

The Timestamp format

If your chosen timestamp format is ambiguous for the order of days, months, and years, such as 01/02/03 10:10, you can suggest the correct interpretation:

2nd of January 2003 - None
1st of February 2003 - Day first
3rd of February 2001 - Year first
2nd of March 2001 - Day first + Year first

To avoid ambiguity, it is recommend to supply timestamps in the RFC 3339 format. For example, 2020-01-31T12:34:56Z for UTC or with a timezone: 2020-08-031T11:20:60-08:00.

Troubleshooting

Note: If your CSV upload fails mid-upload, you can upload the same CSV again, after fixing the error. If the message IDs and the text (subject and body) of any already uploaded messages remain unchanged, you will not be charged additional AI Units or Platform Units when retrying the upload.

This section contains the possible error messages that may occur during the upload process, and the solutions to each of them.

In the following error messages, {something} maps to contextual information about where the error occurred. Additionally, the way we refer to a position in the file is standardized as:

String	Expands to:
{position}	record {row-number} on line {line-number} column {column-number} (byte {byte-number})

The title of the error message is displayed along with a description, as shown in the following image:

Note: If you encounter a different error other than the ones listed in the table, retry the upload.

Error Kind	Error Message	Description
Not Enough Columns	The CSV file only contains {number-columns} columns, but at least 3 are needed, that is, text, timestamp and id.	The uploaded CSV does not contain at least 3 columns, or the platform has mis-detected the encoding of the file.
Invalid Encoding	The file contains invalid characters, where the encoding is detected as {detected-encoding}.	The file is not correctly encoded as UTF-8, UTF-16, or UTF-32. The platform automatically detects the format of the file.
Invalid Header	`string:ti:er'` does not match `'(^delimiter\|id\|message\|timestamp \|timestamp_default_utc_offset \|timestamp_day_first\|timestamp_year_first\\Z)\|(^(?P<property_type>number\|string):(?P<name>\\w(?:[\\w]{0,30}\\w)?)\\Z)'`	If a column header is an invalid name for a user property, the platform returns the default message for when the schema of a request is invalid. Check that each column header is a valid format for its purpose. The maximum length for a column header is 32 alphanumeric characters.
Unequal Row Lengths	The CSV contains unequal row lengths. Message {position} has {number} fields, but the previous record has {number} fields.	The CSV contains rows with different numbers of cells in them or that are inconsistent with the number of headers.
Id format	Invalid message id for {record}. IDs can only consist of ASCII alphanumeric characters and punctuation, except for forward slash `/`. Cell value: {cell-value}.	Occurs when an ID field consists of invalid characters as described in the error message.
Id length	The ID is too long for message {record}. It has {number} bytes, expected at most 1024.	Occurs when an ID field is longer than the maximum allowed length, 1024 characters.
Timestamp Format	Incorrectly formatted timestamp in message {position}: {timestamp-error-message}. Cell value: {cell-value}.	Occurs when a timestamp field could not be parsed.
Message Length	Message is too long for message {position}. It has {number} bytes, expected at most 65536.	Occurs when a message field is longer than the maximum allowed length, 65536 characters.
Number Property Format	Incorrectly formatted number in message {position}: {number-error-message}. Cell value: {cell-value}.	Occurs when a number user property field could not be parsed. The platform should allow any format that can reasonably be decoded as a number.
Property Length	Property is too long for message {position}. It has {number} bytes, expected at most 4096.	Occurs when a user property field is longer than the maximum allowed length, 4096 characters.
Unknown Error	Unknown CSV error: {underlying-error-message}.	If an unknown error occurs, retry the upload.

On this page

Using the correct formats
Troubleshooting

Was this page helpful?

PREVIOUSPreparing data for .CSV upload

NEXTCreating a dataset

Support and Services

Get The Help You Need

UiPath Academy

Learning RPA - Automation Courses

UiPath Forum

UiPath Community Forum

Trust and Security

Cookies Policy