- Introduction
- Setting up your account
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields
- Labels (predictions, confidence levels, label hierarchy, and label sentiment)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Annotated and unannotated messages
- Extraction Fields
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Access Control and Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Creating or deleting a data source in the GUI
- Uploading a CSV file into a source
- Preparing data for .CSV upload
- Creating a dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amending dataset settings
- Deleting a message
- Deleting a dataset
- Exporting a dataset
- Using Exchange integrations
- Model training and maintenance
- Understanding labels, general fields, and metadata
- Label hierarchy and best practices
- Comparing analytics and automation use cases
- Turning your objectives into labels
- Overview of the model training process
- Generative Annotation
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Training chat and calls data
- Understanding data requirements
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and Recall
- How validation works
- Understanding and improving model performance
- Reasons for label low average precision
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining™
- Developer
- Exchange Integration with Azure service user
- Exchange Integration with Azure Application Authentication
- Exchange Integration with Azure Application Authentication and Graph
- Fetching data for Tableau with Python
- Elasticsearch integration
- Self-hosted Exchange integration
- UiPath® Automation Framework
- UiPath® Marketplace activities
- UiPath® official activities
- How machines learn to understand words: a guide to embeddings in NLP
- Prompt-based learning with Transformers
- Efficient Transformers II: knowledge distillation & fine-tuning
- Efficient Transformers I: attention mechanisms
- Deep hierarchical unsupervised intent modelling: getting value without training data
- Fixing annotating bias with Communications Mining™
- Active learning: better ML models in less time
- It's all in the numbers - assessing model performance with metrics
- Why model validation is important
- Comparing Communications Mining™ and Google AutoML for conversational data intelligence
- Licensing
- FAQs and more

Communications Mining user guide
Model training FAQs
- General model training
- Label training
The objective of training a model is to create a set of training data that is as representative as possible of the dataset as a whole, so that the platform can accurately and confidently predict the relevant labels and general fields for each message. The labels and general fields within a dataset should be intrinsically linked to the overall objectives of the use case and provide significant business value.
As soon as data is uploaded to the platform, the platform begins a process called unsupervised learning, through which it groups messages into clusters of similar semantic intent. This process can take up to a couple of hours, depending on the size of the dataset, and clusters will appear once it is complete.
To be able to train a model, you need a minimum amount of existing historical data. This is used as training data to provide the platform with the necessary information to confidently predict each of the relevant concepts for your analysis and/or automation.
The recommendation for any use case is a minimum of 12 months of historical data, in order to properly capture any seasonality or irregularity in the data, such as month-end processes and busy seasons.
No, you do not need to save your model after any changes are made. Every time you train the platform on your data, that is, annotating any messages, a new model version is created for your dataset. Performance statistics for older model versions can be viewed in the Validation page.
Check the Validation page in the platform, which reports various performance measures and provides a holistic model health rating. This page updates after every training event and it can be used to identify areas where the model may need more training examples or some label corrections in order to ensure consistency.
For complete explanations of model performance and how to improve it, check Validation .
The clusters are a helpful way to help you quickly build up your taxonomy, but users will spend most of their time training in the Explore page rather than in Discover.
If users spend too much time annotating via clusters, there’s a risk of overfitting the model to look for messages that only fit these clusters when making predictions. The more varied examples there are for each label, the better the model will be at finding the different ways of expressing the same intent or concept. This is one of the main reasons why we only show 30 clusters at a time.
Once enough training has been completed or a significant volume of data has been added to the platform, however, Discover does retrain. When it retrains, it takes into account the existing training to-date, and will try to present new clusters that are not well covered by the current taxonomy.
For more details, check Discover.
There are 30 clusters in total, each containing 12 messages. In the platform, you are able to filter the number of messages shown on the page in increments between 6 and 12 per page. Our recommendation is annotating 6 at a time to ensure that you reduce the risk of partially annotating any messages.
Precision and recall are metrics used to measure the performance of a machine learning model. A detailed description of each can be found under the Using Validation section of our how-to guides.
You can access the validation overview of earlier models by hovering over Model Version in the Validation page. This can be helpful for tracking and comparing progress as you train out your model.
If you need to roll your model back to a previous pinned version, check Model rollback for more details.
Yes, it’s really easy to do. You can go into the settings for each label and rename it at any point. For more details, check Label editing.
Information about your dataset, including how many message that have been annotated, is displayed in the Datasets Settings page. For more details how to access it, check Amend dataset settings.
If the Validation page shows that your label is performing poorly, there are various ways to improve its performance. To understand more, check Understanding and improving model performance.
The little red dials next to each label/general field indicate whether more examples are needed for the platform to accurately estimate the label/general field's performance. The dials start to disappear as you provide more training examples and will disappear completely once you reach 25 examples.
After this, the platform will be able to effectively evaluate the performance of a given label/general field and may return a performance warning if the label or general field is not healthy.
The platform is able to learn from empty messages and uninformative messages as long as they are annotated correctly. However, it is worth noting that uninformative labels will likely need a significant number of training examples, as well as to be loosely grouped by concept, to ensure best performance.
- General model training
- What is the objective of training a model?
- Why can I not see anything in Discover if I've just uploaded data into the platform?
- How much historical data do I need to train a model?
- Do I need to save my model every time I make a change?
- How do I know what the performance of the model is?
- Why are there only 30 clusters available and can we set them individually?
- How many messages are in each cluster?
- What do precision and recall mean?
- Can I return to an earlier version of my model?
- Label training
- Can I change the name of a label later on?
- How do I find out the number of messages I have annotated?
- One of my labels is performing poorly, what can I do to improve it?
- What does the red dial next to my label or general field indicate? How do I get rid of it?
- Should I avoid annotating empty or uninformative messages?