- Introduction
- Setting up your account
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields
- Labels (predictions, confidence levels, label hierarchy, and label sentiment)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Annotated and unannotated messages
- Extraction Fields
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Access Control and Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Creating or deleting a data source in the GUI
- Uploading a CSV file into a source
- Preparing data for .CSV upload
- Creating a dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amending dataset settings
- Deleting a message
- Deleting a dataset
- Exporting a dataset
- Using Exchange integrations
- Model training and maintenance
- Understanding labels, general fields, and metadata
- Label hierarchy and best practices
- Comparing analytics and automation use cases
- Turning your objectives into labels
- Overview of the model training process
- Generative Annotation
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Training chat and calls data
- Understanding data requirements
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and Recall
- How validation works
- Understanding and improving model performance
- Reasons for label low average precision
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining™
- Developer
- Exchange Integration with Azure service user
- Exchange Integration with Azure Application Authentication
- Exchange Integration with Azure Application Authentication and Graph
- Fetching data for Tableau with Python
- Elasticsearch integration
- Self-hosted Exchange integration
- UiPath® Automation Framework
- UiPath® Marketplace activities
- UiPath® official activities
- How machines learn to understand words: a guide to embeddings in NLP
- Prompt-based learning with Transformers
- Efficient Transformers II: knowledge distillation & fine-tuning
- Efficient Transformers I: attention mechanisms
- Deep hierarchical unsupervised intent modelling: getting value without training data
- Fixing annotating bias with Communications Mining™
- Active learning: better ML models in less time
- It's all in the numbers - assessing model performance with metrics
- Why model validation is important
- Comparing Communications Mining™ and Google AutoML for conversational data intelligence
- Licensing
- FAQs and more

Communications Mining user guide
Training using Shuffle
Shuffle is the first step in the Explore phase, and its purpose is to provide users with a random selection of messages for them to review. In shuffle mode, the platform will show you messages that have predictions covering all labels, and where there are none, so the Shuffle step differs from the others in Explore in that it does not focus on a specific label to train but covers them all.
It is very important to use the Shuffle mode to ensure that you provide your model with sufficient training examples that are representative of the dataset as a whole, and are not biased by focusing only on very specific areas of the data.
Overall, at least 10% of the training you complete in your dataset should be in Shuffle mode.
Annotating in Shuffle mode essentially helps ensure that your taxonomy covers the data within your dataset well, and prevents you from creating a model that can very accurately make predictions on only a small fraction of the data within the dataset.
Looking through messages in Shuffle mode is therefore an easy way to get a sense of how the overall model is doing, and can be referred to throughout the training process. In a well-trained taxonomy, you should be able to go through any unreviewed messages on Shuffle and just accept predictions to further train the model. If you find lots of the predictions are incorrect, you can see which labels require more training.
Going through multiple pages on Shuffle later on in the training process is also a good way to check if there are intents or concepts that have not been captured by your taxonomy and should have been. You can then add existing labels where required, or create new ones if needed.
- Select Shuffle from the drop-down menu to be presented with 20 random messages.
- Filter to unreviewed messages.
- Review each message and any
associated predictions:
- If there are predictions, you should either confirm or reject these. Confirm by selecting on the ones that apply.
- You should also add all other additional labels that apply.
- If you reject the predictions, you should apply all of the correct labels. Make sure you do not leave the message with no labels applied.
- You can also hit the refresh button to get a new set of messages, or continue to the next page by selecting the page numbers or arrows.
You are recommended to annotate at least a minimum of 10 pages of messages in Shuffle. In large datasets that contain many training examples, this could be much more.