communications-mining
latest
false
UiPath logo, featuring letters U and I in white
Communications Mining User Guide
Last updated Nov 19, 2024

Preparing data for .CSV upload

User permissions required: 'Sources admin' AND 'Edit messages'.

You can find instructions on uploading data from a .csv here, along with common error messages you may encounter in the platform.

Prior to uploading data into Communications Mining, there are a few factors to take into consideration when preparing the data to be ingested by the platform.

Important:

Please ensure you are uploading a .csv file, and not an Excel file.

If you have been opening the .csv in Excel and making changes, this can lead to formatting issues potentially causing issues at the point of upload. To avoid this, please ensure any updates are done in the .csv directly.

Additionally, please check for the following before uploading your .csv into the platform to avoid encountering any errors upon uploading, or data quality issues that will negatively impact the quality of model performance:

ItemDescription
Duplicate rows Having the same data repeated multiple times across the data extract
Mismatched headersHaving the wrong headers aligned to the wrong data fields
Hanging rows or columnsNot having all the data contained in sequential rows

Example: Having all messages in Row 1 to 10,000, but having a row with a cell containing data in row 19,999.

Inconsistent date formatting Different rows with inconsistent date formats

Example: Having a number of messages in US date format, and a number of messages in EU date format, all in the same dataset, as this will have issues normalizing downstream.

Incoherent sentencesThese are sentences that contain an assortment of words without a clear syntactic or semantic structure

Example: 'The user is requesting a new portable 28442 298 ticket to be creaportableted'

Inconsistent spacingWhen there are an irregular number of spaces in between words.

Example: instead of 'The policy is set to renew'

Breaks in wordsWhen there are breaks in the middle of a word, when there shouldn't be.

Example:'The po licy is set. to renew' instead of 'The policy is set to renew'

Erroneous character encodingWhen text data is not properly encoded, resulting in garbled or unreadable characters.

Example: 'ThÇ åpp is gré¶t' instead of 'The app is great.'

Blank messagesCommunications without any content included in the subject/body
Messages with lots of typosText data containing lots of errors in spelling
Headers / footers When there are headers or footers included

Example:Spam warnings, virus scan warnings, etc.

Metadata included in the subject/body instead of as a metadata propertyWhen metadata is included in the subject or body

Example:'[01/01/2023] I would like to renew my policy' as the body of a message, instead of 'I would like to renew my policy' as the message with 01/01/2023 as the date included in the metadata.

Multiple messages combined into one messageWhen there are multiple messages that should have been broken out into separate messages in a thread, combined into a single communication.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.