- Introduction
- Setting up your account
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields
- Labels (predictions, confidence levels, label hierarchy, and label sentiment)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Annotated and unannotated messages
- Extraction Fields
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Access Control and Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Creating or deleting a data source in the GUI
- Uploading a CSV file into a source
- Preparing data for .CSV upload
- Creating a dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amending dataset settings
- Deleting a message
- Deleting a dataset
- Exporting a dataset
- Using Exchange integrations
- Model training and maintenance
- Understanding labels, general fields, and metadata
- Label hierarchy and best practices
- Comparing analytics and automation use cases
- Turning your objectives into labels
- Overview of the model training process
- Generative Annotation
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Training chat and calls data
- Understanding data requirements
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and Recall
- How validation works
- Understanding and improving model performance
- Reasons for label low average precision
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining™
- Developer
- Exchange Integration with Azure service user
- Exchange Integration with Azure Application Authentication
- Exchange Integration with Azure Application Authentication and Graph
- Fetching data for Tableau with Python
- Elasticsearch integration
- Self-hosted Exchange integration
- UiPath® Automation Framework
- UiPath® Marketplace activities
- UiPath® official activities
- How machines learn to understand words: a guide to embeddings in NLP
- Prompt-based learning with Transformers
- Efficient Transformers II: knowledge distillation & fine-tuning
- Efficient Transformers I: attention mechanisms
- Deep hierarchical unsupervised intent modelling: getting value without training data
- Fixing annotating bias with Communications Mining™
- Active learning: better ML models in less time
- It's all in the numbers - assessing model performance with metrics
- Why model validation is important
- Comparing Communications Mining™ and Google AutoML for conversational data intelligence
- Licensing
- FAQs and more

Communications Mining user guide
Multilingual sources and datasets
Communications Mining™ supports multilingual sources and datasets. This means that the models can understand sources that contain multiple different supported languages, without actually having to translate them.
- English
- Dutch
- French
- German
- Italian
- Japanese
- Portuguese
- Spanish
If you work and do business in several languages that the platform supports, you can train on messages in those languages, rather than translating everything into a single language.
- If a dataset is multilingual, you cannot view translations of any messages, as provided for translated datasets. As a result, you will need to understand all of the languages in the dataset to effectively train their model.
- Understanding multiple languages is a more complex machine-learning problem than understanding a single language. As a result, these datasets may potentially experience a slight drop in performance compared to datasets in a single language.
- If the dataset contains other
languages than the supported ones, applying labels used for supported languages
may cause confusion. Instead, annotate these instances with language-specific
labels.
Note: The platform cannot process or understand the content of unsupported languages.
When creating a data source or a dataset, the platform selects by default the English language for both of them.
To change the language while creating your data source or dataset, proceed as follows:
- Navigate to the Set the language, and enable translation for your source step.
- In the Language dropdown menu, select Multilingual.
- You can no longer change the language once the data source or dataset is created.
- Multilingual datasets can contain sources of any language family that the platform supports.
- To learn how to create data sources and datasets, check Creating a data source and Creating a dataset.
We currently support a wide range of additional languages in Preview mode, as shown in the following list. This means that our team refines them based on your usage.
- Afrikaans
- Albanian
- Amharic
- Arabic
- Armenian
- Assamese
- Azerbaijani
- Basque
- Belarusian
- Bengali
- Bengali (Romanized)
- Bosnian
- Breton
- Bulgarian
- Burmese
- Burmese
- Catalan
- Chinese (Simplified)
- Chinese (Traditional)
- Croatian
- Czech
- Danish
- Esperanto
- Estonian
- Filipino
- Finnish
- Galician
- Georgian
- Greek
- Gujarati
- Hausa
- Hebrew
- Hindi
- Hindi (Romanized)
- Hungarian
- Icelandic
- Indonesian
- Irish
- Javanese
- Kannada
- Kazakh
- Khmer
- Korean
- Kurdish (Kurmanji)
- Kyrgyz
- Lao
- Latin
- Latvian
- Lithuanian
- Macedonian
- Malagasy
- Malay
- Malayalam
- Marathi
- Mongolian
- Nepali
- Norwegian
- Oriya
- Oromo
- Pashto
- Persian
- Polish
- Punjabi
- Romanian
- Russian
- Sanskrit
- Scottish Gaelic
- Serbian
- Sindhi
- Sinhala
- Slovak
- Slovenian
- Somali
- Sundanese
- Swahili
- Swedish
- Swiss German
- Tamil
- Tamil (Romanized)
- Telugu
- Telugu (Romanized)
- Thai
- Turkish
- Ukrainian
- Urdu
- Urdu (Romanized)
- Uyghur
- Uzbek
- Vietnamese
- Welsh
- Western Frisian
- Xhosa
- Yiddish