ai-center
latest
false
UiPath logo, featuring letters U and I in white
AI Center
Automation CloudAutomation SuiteStandalone
Last updated Nov 19, 2024

Multilabel Text Classification

Note:

Multilabel Text Classification is currently in public preview.

UiPath® is committed to stability and quality of our products, but preview features are always subject to change based on feedback that we receive from our customers. Using preview features is not recommended for production deployments.

Out of the Box Packages Multilabel text classification

This is a generic, retrainable model for tagging a text with multiple labels. This ML Package must be trained, and if deployed without training first, the deployment will fail with an error stating that the model is not trained. It is based on BERT, a self-supervised method for pretraining natural language processing systems. A GPU is recommended, especially during training. A GPU delivers ~5-10x improvement in speed.

Languages

This multilingual model supports the languages listed below. These languages were chosen because they are the top 100 languages with the largest Wikipedias:

  • Afrikaans
  • Albanian
  • Arabic
  • Aragonese
  • Armenian
  • Asturian
  • Azerbaijani
  • Bashkir
  • Basque
  • Bavarian
  • Belarusian
  • Bengali
  • Bishnupriya Manipuri
  • Bosnian
  • Breton
  • Bulgarian
  • Burmese
  • Catalan
  • Cebuano
  • Chechen
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Chuvash
  • Croatian
  • Czech
  • Danish
  • Dutch
  • English
  • Estonian
  • Finnish
  • French
  • Galician
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian
  • Hebrew
  • Hindi
  • Hungarian
  • Icelandic
  • Ido
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Kirghiz
  • Korean
  • Latin
  • Latvian
  • Lithuanian
  • Lombard
  • Low Saxon
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Marathi
  • Minangkabau
  • Nepali
  • Newar
  • Norwegian (Bokmal)
  • Norwegian (Nynorsk)
  • Occitan
  • Persian (Farsi)
  • Piedmontese
  • Polish
  • Portuguese
  • Punjabi
  • Romanian
  • Russian
  • Scots
  • Serbian
  • Serbo-Croatian
  • Sicilian
  • Slovak
  • Slovenian
  • South Azerbaijani
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tagalog
  • Tajik
  • Tamil
  • Tatar
  • Telugu
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Volapük
  • Waray-Waray
  • Welsh
  • West Frisian
  • Western Punjabi
  • Yoruba

Model details

Input type

JSON

Input description

Text to be classified as String: 'I love this actor but I hate his movies.'

Output description

JSON with two lists. The first list will contain predicted label(s) and the second list will contain associated confidence on the label predicted (between 0-1).

Example:

{
  "labels": [
    "deliver",
    "payment"
  ],
  "confidence": [
    0.780,
    0.899
  ]
}{
  "labels": [
    "deliver",
    "payment"
  ],
  "confidence": [
    0.780,
    0.899
  ]
}

Recommend GPU

By default, a GPU is recommended.

Training enabled

Training is enabled.

Pipelines

This package supports all three types of pipelines (Full Training, Training, and Evaluation). For most use cases, no parameters need to be specified. The model uses advanced techniques to find a performant model. In the following trainings after the first one, the model uses incremental learning (that is, the previously trained version will be used, at the end of a training run).

Dataset format

The model will read all CSV files in the specified directory. In every CSV file, the model expects two columns or two properties, text and label by default. The names of these two columns and/or properties are configurable using environment variables.

CSV file format

Each CSV file can have any number of columns, but only two will be used by the model. Those columns are specified by the parameters dataset.text_column_name (if not modified, the default value is text) and dataset.target_column_name (if not modified, the default value is labels).

For example, a single CSV file can look like this:

text,labels
"I love this actor but I hate his movies", ['positive', 'negative']text,labels
"I love this actor but I hate his movies", ['positive', 'negative']

Training on GPU or CPU

You can use either GPU or CPU for training. We recommend using GPU since it's faster.

Environment variables

  • dataset.text_column_name - default value text
  • model.epochs - default value 100
  • dataset.target_column_name - default value label

Artifacts

Confusion matrix

In order to better cover all labels, in the case of Multilabel Text Classification the confusion matrix is a JSON file. We provide a confusion matrix for each label ([[#True Positives, #True Negatives], [# False Positives, # False Negatives]])

{
    "labels":[
        "positive",
        "negative"
    ],
    "multilabel_confusion_matrix":[
        [
            [
                83,
                4
            ],
            [
                21,
                4
            ]
        ],
        [
            [
                105,
                1
            ],
            [
                6,
                0
            ]
        ]
    ]
}{
    "labels":[
        "positive",
        "negative"
    ],
    "multilabel_confusion_matrix":[
        [
            [
                83,
                4
            ],
            [
                21,
                4
            ]
        ],
        [
            [
                105,
                1
            ],
            [
                6,
                0
            ]
        ]
    ]
}

Classification report

{
  "positive": {
    "precision": 0.89, "recall": 0.78, "f1-score": 0.84242424242424243, "support": 100
  },
    "negative": {
      "precision": 0.9, "recall": 0.87, "f1-score": 0.86765432236398, "support": 89
    }
}{
  "positive": {
    "precision": 0.89, "recall": 0.78, "f1-score": 0.84242424242424243, "support": 100
  },
    "negative": {
      "precision": 0.9, "recall": 0.87, "f1-score": 0.86765432236398, "support": 89
    }
}

Evaluation

This is a CSV file with predictions on the test set used for evaluation.

label, text, predictions, confidence
{<code>positive</code>, <code>negative</code>}, "I love this actor but I hate his movies", [<code>positive</code>, <code>negative</code>], [0.9118645787239075, 0.971538782119751]label, text, predictions, confidence
{<code>positive</code>, <code>negative</code>}, "I love this actor but I hate his movies", [<code>positive</code>, <code>negative</code>], [0.9118645787239075, 0.971538782119751]

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.