Communications Mining user guide

Preparing data for .CSV upload

Note:

You must have assigned the IXP Project Admin role as an Automation Cloud™ user, or the Sources admin and Edit messages permissions as a legacy user to upload CSV files into a source.
For more details on how to upload data from a .csv, along with common error messages, check Uploading a CSV file into a source.

Prior to uploading data into Communications Mining™, there are a few factors to take into consideration when preparing the data for the platform to ingest:

If you have opened the .csv in Excel multiple times and made changes, this can lead to formatting issues that can affect the upload process. To avoid this, make any updates directly within the .csv file.
Save the CSV file as CSV UTF-8 (Comma delimited) so that every field is wrapped in quotation marks. You can check the result in a text editor.
Important:
Make sure you upload a .csv file, not an Excel file.

Check for the items listed in the following table before uploading your.csv into the platform. This helps you avoid any errors upon uploading, or data quality issues that will negatively impact the quality of model performance.

Item	Description
Duplicate rows	Having the same data repeated multiple times across the data extract.
Mismatched headers	Having the wrong headers aligned to the wrong data fields.
Hanging rows or columns	Not having all the data contained in sequential rows. For example, having all messages in Row 1 to 10,000, but having a row with a cell containing data in row 19,999.
Inconsistent date formatting	Different rows with inconsistent date formats. For example, having a number of messages in US date format, and a number of messages in EU date format, all in the same dataset, as this will have issues normalizing downstream.
Incoherent sentences	These are sentences that contain an assortment of words without a clear syntactic or semantic structure. For example: The user is requesting a new portable 28442 298 ticket to be creaportableted.
Inconsistent spacing	When there are an irregular number of spaces in between words. For example: The policy is set to renew. instead of: The policy is set to renew.
Breaks in words	When there are breaks in the middle of a word. For example: The po licy is set. to renew. instead of: The policy is set to renew.
Erroneous character encoding	When text data is not properly encoded, resulting in garbled or unreadable characters. For example: ThÇ åpp is gré¶t. instead of: The app is great.
Blank messages	Communications without any content included in the subject or body.
Messages with lots of typos	Text data containing many spelling errors.
Headers / footers	When there are headers or footers included. For example, spam warnings, virus scan warnings, and so on.
Metadata included in the subject/body instead of as a metadata property	When metadata is included in the subject or body. For example: [01/01/2023] I would like to renew my policy. as the body of a message, instead of: I would like to renew my policy. as the message with 01/01/2023 as the date included in the metadata.
Multiple messages combined into one message	When multiple messages that should have been split into separate thread messages are instead combined into a single communication.

Was this page helpful?

PREVIOUSCreating or deleting a data source in the GUI

NEXTUploading a CSV file into a source