AI Center - Multilabel Text Classification

ai-center

latest

false

AI Center user guide

Getting started
Notifications
- My notifications
Projects
- About Projects
- Managing Projects
Datasets
- About Datasets
- Managing Datasets
Data Labeling
ML packages
Out of the box packages
Pipelines
ML Skills
- About ML Skills
- Managing ML Skills
ML Logs
- About ML Logs
Document UnderstandingTM in AI Center
- Document Manager
- OCR Services
AI Center API
- Overview
- API list
Licensing
AI Solutions Templates
- About AI Solution Templates
  - Email AI
How to
- ML packages
  - Use Custom NER with continuous learning
- ML Skills
Basic Troubleshooting Guide

Multilabel Text Classification

Note:

This ML package will soon be deprecated. For more information, check the Deprecation timeline page from the Overview guide.

Note:

Multilabel Text Classification is currently in public preview.

UiPath® is committed to stability and quality of our products, but preview features are always subject to change based on feedback that we receive from our customers. Using preview features is not recommended for production deployments.

Out of the Box Packages Multilabel text classification

This is a generic, retrainable model for tagging a text with multiple labels. This ML Package must be trained, and if deployed without training first, the deployment will fail with an error stating that the model is not trained. It is based on BERT, a self-supervised method for pretraining natural language processing systems. A GPU is recommended, especially during training. A GPU delivers ~5-10x improvement in speed.

Languages

This multilingual model supports the languages from the following list. These languages were chosen because they are the top 100 languages with the largest Wikipedias:

Afrikaans
Albanian
Arabic
Aragonese
Armenian
Asturian
Azerbaijani
Bashkir
Basque
Bavarian
Belarusian
Bengali
Bishnupriya Manipuri
Bosnian
Breton
Bulgarian
Burmese
Catalan
Cebuano
Chechen
Chinese (Simplified)
Chinese (Traditional)
Chuvash
Croatian
Czech
Danish
Dutch
English
Estonian
Finnish
French
Galician
Georgian
German
Greek
Gujarati
Haitian
Hebrew
Hindi
Hungarian
Icelandic
Ido
Indonesian
Irish
Italian
Japanese
Javanese
Kannada
Kazakh
Kirghiz
Korean
Latin
Latvian
Lithuanian
Lombard
Low Saxon
Luxembourgish
Macedonian
Malagasy
Malay
Malayalam
Marathi
Minangkabau
Nepali
Newar
Norwegian (Bokmal)
Norwegian (Nynorsk)
Occitan
Persian (Farsi)
Piedmontese
Polish
Portuguese
Punjabi
Romanian
Russian
Scots
Serbian
Serbo-Croatian
Sicilian
Slovak
Slovenian
South Azerbaijani
Spanish
Sundanese
Swahili
Swedish
Tagalog
Tajik
Tamil
Tatar
Telugu
Turkish
Ukrainian
Urdu
Uzbek
Vietnamese
Volapük
Waray-Waray
Welsh
West Frisian
Western Punjabi
Yoruba

Model details

Input type

JSON

Input description

Text to be classified as String: 'I love this actor but I hate his movies.'

Output description

JSON with two lists. The first list will contain predicted label(s) and the second list will contain associated confidence on the label predicted (between 0-1).

Example:

{
  "labels": [
    "deliver",
    "payment"
  ],
  "confidence": [
    0.780,
    0.899
  ]
}
{
  "labels": [
    "deliver",
    "payment"
  ],
  "confidence": [
    0.780,
    0.899
  ]
}

By default, a GPU is recommended.

Training enabled

Training is enabled.

Pipelines

This package supports all three types of pipelines (Full Training, Training, and Evaluation). For most use cases, no parameters need to be specified. The model uses advanced techniques to find a performant model. In the following trainings after the first one, the model uses incremental learning (that is, the previously trained version will be used, at the end of a training run).

Dataset format

The model will read all CSV files in the specified directory. In every CSV file, the model expects two columns or two properties, text and label by default. The names of these two columns and/or properties are configurable using environment variables.

CSV file format

Each CSV file can have any number of columns, but only two will be used by the model. Those columns are specified by the parameters dataset.text_column_name (if not modified, the default value is text) and dataset.target_column_name (if not modified, the default value is labels).

For example, a single CSV file can look like this:

text,labels
"I love this actor but I hate his movies", ['positive', 'negative']
text,labels
"I love this actor but I hate his movies", ['positive', 'negative']

Training on GPU or CPU

You can use either GPU or CPU for training. We recommend using GPU since it's faster.

Environment variables

dataset.text_column_name - default value text
model.epochs - default value 100
dataset.target_column_name - default value label

Artifacts

Confusion matrix

In order to better cover all labels, in the case of Multilabel Text Classification the confusion matrix is a JSON file. We provide a confusion matrix for each label ([[#True Positives, #True Negatives], [# False Positives, # False Negatives]])

{
    "labels":[
        "positive",
        "negative"
    ],
    "multilabel_confusion_matrix":[
        [
            [
                83,
                4
            ],
            [
                21,
                4
            ]
        ],
        [
            [
                105,
                1
            ],
            [
                6,
                0
            ]
        ]
    ]
}
{
    "labels":[
        "positive",
        "negative"
    ],
    "multilabel_confusion_matrix":[
        [
            [
                83,
                4
            ],
            [
                21,
                4
            ]
        ],
        [
            [
                105,
                1
            ],
            [
                6,
                0
            ]
        ]
    ]
}

Classification report

{
  "positive": {
    "precision": 0.89, "recall": 0.78, "f1-score": 0.84242424242424243, "support": 100
  },
    "negative": {
      "precision": 0.9, "recall": 0.87, "f1-score": 0.86765432236398, "support": 89
    }
}
{
  "positive": {
    "precision": 0.89, "recall": 0.78, "f1-score": 0.84242424242424243, "support": 100
  },
    "negative": {
      "precision": 0.9, "recall": 0.87, "f1-score": 0.86765432236398, "support": 89
    }
}

Evaluation

This is a CSV file with predictions on the test set used for evaluation.

label, text, predictions, confidence
{<code>positive</code>, <code>negative</code>}, "I love this actor but I hate his movies", [<code>positive</code>, <code>negative</code>], [0.9118645787239075, 0.971538782119751]
label, text, predictions, confidence
{<code>positive</code>, <code>negative</code>}, "I love this actor but I hate his movies", [<code>positive</code>, <code>negative</code>], [0.9118645787239075, 0.971538782119751]

Was this page helpful?

PREVIOUSSemantic Similarity

NEXTUiPath Image Analysis

Languages​

Model details​

Input type​

Input description​

Output description​

Recommend GPU​

Training enabled​

Pipelines​

Dataset format​

CSV file format​

Training on GPU or CPU​

Environment variables​

Artifacts​

Confusion matrix​

Classification report​

Evaluation​