- Introduction
- Setting up your account
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields
- Labels (predictions, confidence levels, label hierarchy, and label sentiment)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Annotated and unannotated messages
- Extraction Fields
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Access Control and Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Creating or deleting a data source in the GUI
- Uploading a CSV file into a source
- Preparing data for .CSV upload
- Creating a dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amending dataset settings
- Deleting a message
- Deleting a dataset
- Exporting a dataset
- Using Exchange integrations
- Model training and maintenance
- Understanding labels, general fields, and metadata
- Label hierarchy and best practices
- Taxonomy design best practice
- Defining taxonomy objectives
- Building the taxonomy structure
- Importing the taxonomy
- Comparing analytics and automation use cases
- Turning your objectives into labels
- Overview of the model training process
- Generative Annotation
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Training chat and calls data
- Understanding data requirements
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and Recall
- How validation works
- Understanding and improving model performance
- Reasons for label low average precision
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining™
- Developer
- Exchange Integration with Azure service user
- Exchange Integration with Azure Application Authentication
- Exchange Integration with Azure Application Authentication and Graph
- Fetching data for Tableau with Python
- Elasticsearch integration
- Self-hosted Exchange integration
- UiPath® Automation Framework
- UiPath® Marketplace activities
- UiPath® official activities
- How machines learn to understand words: a guide to embeddings in NLP
- Prompt-based learning with Transformers
- Efficient Transformers II: knowledge distillation & fine-tuning
- Efficient Transformers I: attention mechanisms
- Deep hierarchical unsupervised intent modelling: getting value without training data
- Fixing annotating bias with Communications Mining™
- Active learning: better ML models in less time
- It's all in the numbers - assessing model performance with metrics
- Why model validation is important
- Comparing Communications Mining™ and Google AutoML for conversational data intelligence
- Licensing
- FAQs and more

Communications Mining user guide
Building the taxonomy structure
One of the most fundamental factors that will determine how well your model performs, as well as how well it meets your business objectives, is the structure of your taxonomy, including what is captured by each of the labels within it.
It is, therefore, important to think about your target taxonomy structure in advance of model training. Having said that, you should have a degree of flexibility to adapt, expand and enhance it as necessary as you progress through the training. This is what we call the data-led training approach.
Ultimately, the labels in the taxonomy, and the training examples provided for each label, should create an accurate and balanced representation of the dataset as a whole. But every label should also be valuable, by clearly representing in some way the messages for which it is predicted.
If labels are used to capture very broad, vague, or confused concepts, not only are they more likely to perform badly, they are less likely to provide business value. This can be to provide useful insights on that concept, or to help a process be fully or partially automated downstream.
This is an example of a high-level taxonomy with typical labels applicable across various use cases or industries. Not all of them will be applicable to your model.
A company receives millions of emails each year into different inboxes from clients about a multitude of issues, queries, suggestions, complaints, etc.
This company decides to increase their operational efficiency, process standardisation, and visibility as to what's going on in their business by automatically turning these emails from customers into workflow tickets. These can be then be tracked and actioned, using specified processes and within set timelines.
To do this, they decide to use the platform to interpret these inbound, unstructured communications and provide a classification regarding the process and sub-process that the email relates to. This classification is used to update the workflow ticket that will be automatically created using an automation service, and ensure that it's routed to the correct team or individual.
To make sure that this use case is as successful as possible, and to minimise the number of exceptions (wrong classifications, or emails the platform is not able to confidently classify), every inbound email should receive a confident prediction that has a parent label and a child label, i.e. [Process X] > [Sub-Process Y].
Given that the objective is to classify every inbound email with a [Process] and [Sub-process], every label in the taxonomy should conform to this format:
In this use case, any email that does not have a confident prediction for both a parent and child label could be an exception, sent for manual review and ticket creation. Alternatively, if it has a high confidence parent label prediction, but not a confident child label prediction, this could still be used to partially route the email or create a ticket, with some additional manual work to add the relevant sub-process.
If we imagine the former is true, and every email without a high confidence prediction in the form of [Process] > [Sub-Process] becomes a manual exception, we want to make sure that all of the examples we provide for each label when training the model reflect this format.
Each parent label in the taxonomy should relate to a broad process relevant to the content in the emails, for example, Invoicing. Each child label should then be a more specific sub-process that sits under a parent label, for exmaple, Invoicing > Status Request.
Extremely broad labels such as General Query or Everything else can be very unhelpful if used to group together lots of different distinct topics, and there is no clear pattern or commonality between the pinned examples.
In this use case, they would also not provide much business value when a workflow ticket was created and classified as General Query or Everything else. Someone would still need to read it carefully to understand what it was about and whether it was relevant for their team before it could be actioned.
This negates any time saving benefit and would not provide useful MI to the business on what work was actually being done by the teams.