- Release Notes
- Getting started
- Notifications
- Projects
- Datasets
- Data Labeling
- ML packages
- Out of the box packages
- Pipelines
- ML Skills
- ML Logs
- Document UnderstandingTM in AI Center
- AI Center API
- Licensing
- AI Solutions Templates
- How to
- Basic Troubleshooting Guide

AI Center
Out of the Box Packages > UiPath Language Analysis > LightTextClassification
This is a generic, retrainable model for text classification. It supports all languages based on Latin characters, such as English, French, Spanish, and others. This ML Package must be trained, and if deployed without training first the deployment will fail with an error stating that the model is not trained. This model operates on Bag of Words. This model provides explainability based on n-grams.
Input type
JSON and CSV
Input description
Text to be classified as String: 'I loved this movie.'
Output description
JSON with class and confidence (between 0-1).
{
    "class": "7",
    "confidence": 0.1259827300369445,
    "ngrams": [
        [
            "like",
            1.3752658445706787
        ],
        [
            "like this",
            0.032029048484416685
        ]
    ]
}{
    "class": "7",
    "confidence": 0.1259827300369445,
    "ngrams": [
        [
            "like",
            1.3752658445706787
        ],
        [
            "like this",
            0.032029048484416685
        ]
    ]
}Recommend GPU
GPU is not required.
Training enabled
By default, training is enabled.
This package supports all three types of pipelines (Full Training, Training, and Evaluation). The model uses advanced techniques to find a performant model using hyperparameter search. By default, hyperparameter search (the BOW.hyperparameter_search.enable variable) is enabled. The parameters of the most performant model are available in the Evaluation Report.
Dataset format
Three options are available to structure your dataset for this model : JSON, CSV and AI Center JSON format (this is also the export format of the labelling tool. The model will read all CSV and JSON files in the specified directory. For every format, the model expects two columns or two properties, dataset.input_column_name and dataset.target_column_name by default. The names of these two columns and/or directories are configurable using environment variables.
CSV file format
Each CSV file can have any number of columns, but only two will be used by the model. Those columns are specified by the dataset.input_column_name and dataset.target_column_name parameters.
Check the following sample and environment variables for a CSV file format example.
text, label
I like this movie, 7
I hated the acting, 9text, label
I like this movie, 7
I hated the acting, 9The environment variables for the previous example would be as follows :
- dataset.input_format: auto
- dataset.input_column_name: text
- dataset.target_column_name: label
JSON file format
Multiple datapoints could be a part of the same JSON file.
Check the following sample and environment variables for a JSON file format example.
[
  {
    "text": "I like this movie",
    "label": "7"
  },
  {
    "text": "I hated the acting",
    "label": "9"
  }
][
  {
    "text": "I like this movie",
    "label": "7"
  },
  {
    "text": "I hated the acting",
    "label": "9"
  }
]The environment variables for the previous example would be as follows :
- dataset.input_format: auto
- dataset.input_column_name: text
- dataset.target_column_name: label
ai_center file format
.json extension.
                     Check the following sample and environment variables for an ai_center file format example.
{
    "annotations": {
        "intent": {
            "to_name": "text",
            "choices": [
                "TransactionIssue",
                "LoanIssue"
            ]
        },
        "sentiment": {
            "to_name": "text",
            "choices": [
                "Very Positive"
            ]
        },
        "ner": {
            "to_name": "text",
            "labels": [
                {
                    "start_index": 37,
                    "end_index": 47,
                    "entity": "Stakeholder",
                    "value": " Citi Bank"
                },
                {
                    "start_index": 51,
                    "end_index": 61,
                    "entity": "Date",
                    "value": "07/19/2018"
                },
                {
                    "start_index": 114,
                    "end_index": 118,
                    "entity": "Amount",
                    "value": "$500"
                },
                {
                    "start_index": 288,
                    "end_index": 293,
                    "entity": "Stakeholder",
                    "value": " Citi"
                }
            ]
        }
    },
    "data": {
        "cc": "",
        "to": "[email protected]",
        "date": "1/29/2020 12:39:01 PM",
        "from": "[email protected]",
        "text": "I opened my new checking account with Citi Bank in 07/19/2018 and met the requirements for the promotion offer of $500 . It has been more than 6 months and I have not received any bonus. I called the customer service several times in the past few months but no any response. I request the Citi honor its promotion offer as advertised."{
    "annotations": {
        "intent": {
            "to_name": "text",
            "choices": [
                "TransactionIssue",
                "LoanIssue"
            ]
        },
        "sentiment": {
            "to_name": "text",
            "choices": [
                "Very Positive"
            ]
        },
        "ner": {
            "to_name": "text",
            "labels": [
                {
                    "start_index": 37,
                    "end_index": 47,
                    "entity": "Stakeholder",
                    "value": " Citi Bank"
                },
                {
                    "start_index": 51,
                    "end_index": 61,
                    "entity": "Date",
                    "value": "07/19/2018"
                },
                {
                    "start_index": 114,
                    "end_index": 118,
                    "entity": "Amount",
                    "value": "$500"
                },
                {
                    "start_index": 288,
                    "end_index": 293,
                    "entity": "Stakeholder",
                    "value": " Citi"
                }
            ]
        }
    },
    "data": {
        "cc": "",
        "to": "[email protected]",
        "date": "1/29/2020 12:39:01 PM",
        "from": "[email protected]",
        "text": "I opened my new checking account with Citi Bank in 07/19/2018 and met the requirements for the promotion offer of $500 . It has been more than 6 months and I have not received any bonus. I called the customer service several times in the past few months but no any response. I request the Citi honor its promotion offer as advertised."For leveraging the previous sample JSON, the environment variables need to be set as follows:
- dataset.input_format: ai_center
- dataset.input_column_name: data.text
- dataset.target_column_name: annotations.intent.choices
Training on GPU or CPU
GPU is not required for training
Environment variables
- dataset.input_column_name- The name of the input column containing the text.
- Default value is data.text.
- Make sure that this variable is configured according to your input JSON or CSV file.
 
- dataset.target_column_name- The name of the target column containing the text.
- Default value is annotations.intent.choices.
- Make sure that this variable is configured according to your input JSON or CSV file.
 
- dataset.input_format- The input format of the training data.
- Default value is ai_center.
- Supported values are: ai_centerorauto.
- If ai_centeris selected, onlyJSONfiles are supported. Make sure to also change the value of the dataset.target_column_name toannotations.sentiment.choicesifai_centeris selected.
- If autois selected, bothCoNLLandJSONfiles are supported.
 
- BOW.hyperparameter_search.enable- The default value for this parameter is True. If left enabled, this will find the most performant model in the given timeframe and compute resources.
- This will also generate a HyperparameterSearch_reportPDF file to showcase variations of parameters that were tried.
 
- The default value for this parameter is 
- BOW.hyperparameter_search.timeout- The maximum time the hyperparameter search is allowed to run in seconds.
- Default value is 1800.
 
- BOW.explain_inference- When this is set to True, during inference time when model is served as ML Skill, some of the most important n-grams will also be returned along with the prediction.
- Default value is False.
 
- When this is set to 
Optional variables
True, the
                        optimal values of these variables are searched. For the following optional
                        parameters to be used by the model, please set the
                        BOW.hyperparameter_search.enable search variable to
                        False:
                     - BOW.lr_kwargs.class_weight- Supported values are: balancedorNone.
 
- Supported values are: 
- BOW.ngram_range- Range of sequence length of consecutive word sequence that can be considered as features for the model.
- Make sure to follow this format: (1, x), wherexis the maximum sequence length you want to allow.
 
- BOW.min_df- Used to set the minimum number of occurrences of the n-gram in the dataset to be considered as a feature.
- Recommended values are between 0and10.
 
- dataset.text_pp_remove_stop_words- Used to configure whether or not stop words should be included in the search (for example, words like the,or).
- Supported values are: TrueorFalse.
 
- Used to configure whether or not stop words should be included in the search (for example, words like 
Data
Evaluation CSV file
This is a CSV file with predictions on the test set used for evaluation. This file also contains the n-grams that impacted the prediction (irrespective of the BOW.explain_inference variable value).