- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Reviewed and unreviewed messages
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create a data source in the GUI
- Uploading a CSV file into a source
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend a dataset's settings
- Delete messages via the UI
- Delete a dataset
- Delete a source
- Export a dataset
- Using Exchange Integrations
- Preparing data for .CSV upload
- Model training and maintenance
- Understanding labels, general fields and metadata
- Label hierarchy and best practice
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more
Preparing data for .CSV upload
User permissions required: 'Sources admin' AND 'Edit messages'.
You can find instructions on uploading data from a .csv here, along with common error messages you may encounter in the platform.
Prior to uploading data into Communications Mining, there are a few factors to take into consideration when preparing the data to be ingested by the platform.
Please ensure you are uploading a .csv file, and not an Excel file.
If you have been opening the .csv in Excel and making changes, this can lead to formatting issues potentially causing issues at the point of upload. To avoid this, please ensure any updates are done in the .csv directly.
Additionally, please check for the following before uploading your .csv into the platform to avoid encountering any errors upon uploading, or data quality issues that will negatively impact the quality of model performance:
Item | Description |
Duplicate rows | Having the same data repeated multiple times across the data extract |
Mismatched headers | Having the wrong headers aligned to the wrong data fields |
Hanging rows or columns | Not having all the data contained in sequential rows
Example: Having all messages in Row 1 to 10,000, but having a row with a cell containing data in row 19,999. |
Inconsistent date formatting | Different rows with inconsistent date formats
Example: Having a number of messages in US date format, and a number of messages in EU date format, all in the same dataset, as this will have issues normalizing downstream. |
Incoherent sentences | These are sentences that contain an assortment of words without a clear syntactic or semantic structure
Example: 'The user is requesting a new portable 28442 298 ticket to be creaportableted' |
Inconsistent spacing | When there are an irregular number of spaces in between words.
Example: instead of 'The policy is set to renew' |
Breaks in words | When there are breaks in the middle of a word, when there shouldn't be.
Example:'The po licy is set. to renew' instead of 'The policy is set to renew' |
Erroneous character encoding | When text data is not properly encoded, resulting in garbled or unreadable characters.
Example: 'ThÇ åpp is gré¶t' instead of 'The app is great.' |
Blank messages | Communications without any content included in the subject/body |
Messages with lots of typos | Text data containing lots of errors in spelling |
Headers / footers | When there are headers or footers included
Example:Spam warnings, virus scan warnings, etc. |
Metadata included in the subject/body instead of as a metadata property | When metadata is included in the subject or body
Example:'[01/01/2023] I would like to renew my policy' as the body of a message, instead of 'I would like to renew my policy' as the message with 01/01/2023 as the date included in the metadata. |
Multiple messages combined into one message | When there are multiple messages that should have been broken out into separate messages in a thread, combined into a single communication. |