ixp
latest
false
Important :
Communications Mining is now part of UiPath IXP. Check the Introduction in the Overview Guide for more details.
UiPath logo, featuring letters U and I in white

Communications Mining user guide

Last updated Aug 1, 2025

Using general fields

A guide to setting up and training General Fields in the platform.

Defining and setting up your fields

It is important to define the key data points (i.e. fields) that you want to extract from your Communications Mining™ data. These typically facilitate downstream automation, but can also be useful for analytics - particularly in assessing the potential success rate and benefit of automation opportunities.

The definitions below help you understand the difference between general and extraction fields:
  • General fields are fields that you may want to extract, that can be found across multiple different topics/labels in a dataset.
  • Extraction fields are the fields conditioned and created on a specific label. In other words, it is tied to a specific label that you want to automate.
Note: If Generative Extraction is available in your region, it is recommended to use general fields as a backup to extraction fields, in case there are no confident label predictions for a message. Use extraction fields, linked to specific labels, to facilitate end to end automation, and general fields for automated triage.

Check out the official documentation, to find out more about the Generative extraction and General vs extraction fields. If Generative Extraction is not available in your region, continue to use general fields as normal. The rest of this section provides guidance on how to use general fields.

Ultimately, general field predictions, combined with labels, can facilitate automation by providing the structured data points needed to complete a specific task or process. It’s much more time-efficient to train general fields in your dataset in conjunction with labels, rather than focusing on one and then the other (i.e., training general fields after training a full taxonomy of labels).

Note: If you want to automate Address Change requests, a label would be used to capture the request type, whilst general fields would capture the various components of the address (i.e., Address Line, City, Postcode / Zip Code, etc.). Each prediction is made available via the API enabling every message to be acted upon.

Understanding general fields

Note: If Generative Extraction is available in your region, it is recommended to use general fields as a backup to extraction fields, in case there are no confident label predictions for a message. Use extraction fields, linked to specific labels, to facilitate end to end automation, and general fields for automated triage.

For more details, check Generative extraction and General fields and extraction fields. If Generative Extraction is not available in your region, continue to use general fields as normal. The rest of this section provides guidance on how to use general fields.

General fields are additional elements of structured data which can be extracted from within the messages in your dataset. General fields include data points such as monetary quantities, dates, currency codes, email addresses, URLs, as well as many other industry specific categories.



The platform can predict most general fields, except the ones trained from scratch, as soon as they are enabled, as it can identify them based on their typical, or in some instances very specific, format and a training set of similar general fields.

Similar to labels, you are able to accept or reject general fields that are correctly or incorrectly predicted, enhancing the ability of the model to identify them in future.

Types of general fields

General fields can be of the following types:

  • Pre-trained general fields, which are based on a set of standard or custom-defined rules, for example, monetary quantity, URL, and date.
  • General fields trained from scratch, which are based on machine learning. You can train these fields as you would train labels.

Trainable and non-trainable general fields

Trainable general fields

All general fields:
  • can be trainable by nature when trained from scratch.
  • can be made trainable when enabled, which are all other kinds of general fields.

Trainable general fields are those that will update live in the platform based on training that users provided. For more details on training general fields, checkReviewing and applying general fields.

If you enable training on a pre-trained general field that is based on a set of standard or custom-defined rules, you can refine the understanding of the platform of that general field within the parameters of those rules. Essentially, further training on these will reduce the scope of what the platform can consider that general field, but not increase it.

This is because many of these general fields, like dates, such as tomorrow, and monetary quantities, such as £20, need to be normalized into a structured data format for downstream systems. Also for general fields, such as ISINs or CUSIPs, these must have a set format, so the platform should not be taught to predict anything that does not conform to their defined formats.

When any trainable general fields are assigned, the platform looks at both the text of the general field, as well as the context of the general field within the rest of the communication, that is, what happens before and after the general field value in the same paragraph, and the one surrounding it. It learns to better predict the general field based on the values themselves, as well as how the value appears within the context of the communication.

Non-trainable general fields

If a pre-trained general field is not set as trainable, you can still accept or reject the general field predictions you identify in your dataset. These are updated and refined offline using this in-platform user feedback.

It is helpful for you to accept or reject these general fields when reviewing messages.

To learn more on how to enable general fields on a dataset, check the Enabling, disabling, updating, and creating general fields page.

Pre-built templates for general fields

Note: You can enable all the general fields as trainable, to refine the understanding of the platform of them through training, and reduce the scope of what the platform considers to be a general field of that kind.

Standard template field types for general fields

When configuring general field types, you can select from one of the following pre-built options, through the template option when selecting the data type for the field type:

General field typeDescription
EmailAn email address.
CurrencyA currency code, such as GBP, CHF, or USD.
URLA uniform resource locator, that is, web address.
SEDOLA financial security identifier, short for Stock Exchange Daily Official List, which is 7 characters in length.
BIC CodeA Business Identifier Code (BIC) is an international standard under ISO 9362 for routing business transactions and identifying business parties. The BIC code is 8 or 11 characters in length.
LEIA Legal Entity Identifier (LEI) is a unique global identifier of legal entities participating in financial transactions. LEI is formatted as a 20-character alpha-numeric code.
ISINAn International Securities Identification Number (ISIN) uniquely identifies a financial security. ISIN is a 12-character alpha-numeric code.

Mark-to-market (MTM or M2M)

Mark-to-market refers to the fair value of an asset or liability. Mark-to-market is based on the current market price, the price of similar assets and liabilities, or on another objectively obsessed fair value.
CUSIPA CUSIP is a 9-digit number or a 9-character alpha-numeric code that identifies a North American financial security for the purposes of facilitating clearing and settlement of trades.

Managing general fields

Note:
  • You must have assigned the Source - Read and Dataset - Manage, permissions as an Automation Cloud user, or the View sources and Modify Datasets, or Datasets Admin, permissions as a legacy user.
  • You have a default quota of 25 general fields per dataset. If you need more than 25 general fields, request a quota increase through the account team.

Enabling general fields on a new dataset

To enable general fields on a new dataset that you want to create, select them during the setup process.

Select the plus + button in the box shown below and you will be presented with a drop-down menu of all of the general fields that you are able to enable for that dataset. Select all of the general fields you want to enable before creating the dataset. If you add any in error, you can select the X icon next to the general field name to remove it.

To understand more about how to create a new dataset, check Create a new dataset.



Enabling, updating, and disabling general fields on an existing dataset

If you want to enable, update or disable general fields for an existing dataset, you can do so from the settings tab on the top navigation bar, and then selecting the Labels and extraction fields tab.

Settings > Labels and extraction fields tab

Enabling general fields

To enable existing general fields, select inside the General Fields box, and select the general fields you want to enable from the drop down menu. Once you're happy with your selections, select Update General Fields (as shown below).

These general fields will have their settings pre-selected for you. You can then update them, including making them trainable, as shown below.

General fields tab

Updating general fields

To update an enabled general field, select the general field in the general field box as shown in the previous images and the Edit general field modal will appear as shown in the following image.

Here you can update the base general field, the title of the general field and the API name (these concepts are described in detail below), as well as making the general field 'trainable'.

If you have previously reviewed general fields for a general field kind that was not set to 'trainable', this information is still stored.

Edit general field modal

Disabling general fields

To remove any selected general fields, simply select the 'X' icon next to the general field name, and then select Update General Fields.

Note:

If you remove a general field and select Update General Fields, this will also remove the training data for that general field for this dataset. If you chose to re-enable the general field, you will need to train it again.

If you make a mistake while updating the general fields, select 'Reset' before you select Update General fields and your changes will not be applied.

Creating new general fields

The previous sections covered how to enable and update existing pre-trained general fields for both new and existing datasets. In each instance, for either a new or existing dataset, you can also create new general fields.

Newly created general fields can be based on an existing pre-trained general field or can be trained from scratch like a new label.

You can do this by selecting the plus + icon in the general field box, either in the Create dataset flow or in the dataset settings page as previously shown.

This will bring up the Add a new general field modal as shown below.

Here you can set the field types, title, and API name, as well as select whether the general field is trainable or not. These can be updated later as previously shown.

When you've filled in each of the fields (explained below), simply select 'Create'.

Create new general field modal

Field types

  • This will serve as the initial state for your new general field, and the dropdown will contain a list of all the pre-trained general fields available to you
    • For example, if you select 'Date' as your base general field, all of the general fields predicted for this kind will be dates, and you could then train the platform to only recognise specific dates
  • If you want to train a general field entirely from scratch, you can select 'None - Train from scratch', and then you essentially start with a blank canvas when training the general field. The platform's predictions for this general field will be entirely based on the training examples you provide

General field title

  • The general field title is the name of the general field that will appear in the UI of the platform

API name

  • The API name of the general field is what will be returned via the API when it provides predictions for messages
  • The API name cannot contain any spaces or punctuation except for dash ( - ) and underscore ( _ )

=======================================================================

Enabling general fields on a new dataset

To enable general fields on a new dataset that you want to create, select them during the setup process.

Select the plus + button in the box shown below and you will be presented with a drop-down menu of all of the general fields that you are able to enable for that dataset. Select all of the general fields you want to enable before creating the dataset. If you add any in error, you can select the X icon next to the general field name to remove it.

To understand more about how to create a new dataset, check Creating a new dataset.



Managing general fields on an existing dataset

To enable, update, or disable general fields for an existing dataset, proceed as follows:
  1. Open the existing dataset.
  2. Select the Settings tab.
  3. Select Taxonomy, and then Labels and extraction fields.

General field filtering

Note: You must have assigned the Source - Read and Dataset - Read permissions as an Automation Cloud user, and the View sources and View general fields permissions as a legacy user.

Similar to labels, you can filter messages by whether they have general fields predicted or assigned, both in Explore and Reports.

You can apply any combination of the AND, ANY OF, and NOT filters when applying more than one general field filter. These filters can give you much greater flexibility when training and interpreting your data, and can provide much deeper insights on what is happening in your communication channels.

What you can do when filtering by general field predictions:

  • Apply multiple general field filters at once, in both Explore and Reports
  • Filter to messages that have one of the number of selected general field predicted. For example, ANY OF the General field X AND General field Y AND so on.
  • Filter to messages that have multiple different general fields predicted. For example, general field X AND general field Y AND so on.
  • Filter to messages that do not have certain general fields predicted. For example, NOT General field Y.
  • Search for general fields containing specific search terms, while having general field filters applied.

All of the general fields you have enabled on your dataset will appear as shown below in the filter bar. Assigning general fields is covered in detail in the Reviewing and applying general fields.

Applying advanced prediction filters

There are now two ways to apply general field filters, and you can use them in combination with each other to create the right type of query.

The default state is the one where no filter is applied and all messages are shown, unless another filter is applied.



To update the general field filter, use the buttons explained in the following table, which also change colour when selected:

docs imageShow messages containing any annotated general fields.
docs imageShow messages predicted to contain a general field

If you want to filter to messages that have any annotated general fields or predicted to contain a general field, use the buttons at the top as shown in the previous table. If you want to filter to messages with specific annotated or predicted general fields, hover over the general field in question and the same two buttons will appear to the right.

If you want to filter to either an assigned or predicted general field, select the name of the general field, and it shows messages with either one of them.

To remove your selection, select the button again, and to remove multiple selections, select All. You can also select Clear All from the the filter bar, but this will clear every filter you have selected, not just general field filters.

The General field bar

The taxonomy of general fields functions as a normal filter bar, and allows you to select multiple general fields at once with a single select for each.

Selecting multiple general fields from the list creates an ANY OF type query.

If you selected General field A, General field B, and General field C in the General field bar, this creates a Show me messages with General field A, General field B, or General field C predicted query.

When filtering to specific general fields, you can make multiple selections. For instance, you could filter to see messages that have an address line general field assigned OR a city general field predicted as shown in the following image.



Add general field filter

The second filter option is the + Add General field filter button.

This enables a dropdown general field bar that allows you to select more complex filters, such as excluding certain general fields from consideration.

From this dropdown, you can select multiple general fields to include or exclude by selecting the name of the general field (for assigned and predicted), or the individual buttons (including minus for where this general field is neither assigned nor predicted).

The result looks like in this example, which returns messages predicted to have the Invoice ID general field, but not the Prod ID general field assigned or predicted:



You can select + Add General field Filter, multiple times to add additional layers to your query. Two separate general field filters create an AND type query, whilst multiple general fields selected in the same general field filter create an ANY OF type query.

In the example below, multiple general field filters have been applied individually. This creates a filter that will return messages predicted to have any of the three general fields in the first filter, but that also have the Policy Number general field predicted, and do not have the UK Postcode general field predicted or assigned.



A helpful tip is that by selecting the & sign in an individual filter containing multiple general fields, you can automatically split them out into individual filters. This would change the query from ANY OF, that is, any of these general fields predicted to AND, that is, all of these general fields predicted.

Combining general field bar filters and added general field filters

You can combine filters from both the general field bar, and individually added general field filters. Filters applied in the general field bar are treated as an AND query with any individually applied general field filters.

For example, in the image below, this combined query would return any messages that had either ORDER ID or PROD ID predicted.

Combine general field filter using general field bar and individually added general field filters.

Combining general field filters and sorting by general field for training

What these filters also mean, is that you can now apply general field filters and sort by a specific general field for a training mode.

Example of the Explore page showing Check general field mode for a specific general field, with an additional general field exclusion filter applied:


Reviewing and applying general fields

Note: You must have assigned the Source - Read and Dataset - Review permissions as an Automation Cloud user, or the View sources and Review and label permissions as a legacy user.

Identifying general field predictions

Predicted general fields appear as colour-highlighted text, such as in the first line of the message depicted in the following image, with a different colour appearing for each different general field type. Once you confirm a general field, either by manually applying it or accepting a prediction, the general field will appear as highlighted text with a bold, darker outline as shown in the following image.

If a paragraph has had general fields assigned, dismissed, or applied, it will appear highlighted in grey, as shown in the body of the message from the following image.



Making general field predictions for trainable general fields

When reviewing trainable general fields, that the platform will learn from both the general field values that you assign, as well as the context of where they appear within the communications, that is, the other language used around the values themselves.

The platform will consider the context of the language in the same paragraph as the general field value, as well as the single paragraphs, denoted by a new separated line, directly before and after the paragraph that the general field sits in.

Note: For general fields that are not set to 'trainable', the platform's predictions are based entirely on the rules defined within the platform for that general field. This can be beneficial for when a general field absolutely has to follow a set format for a downstream automation, with any incorrect values causing a failure or exception.

General field confidence scores

When the platform predicts which general fields apply to a communication, it assigns each prediction a confidence score (%) to show how confident it is that the general field applies to the highlighted span of text. You can view a general field’s confidence score by hovering over the general field.

This confidence score is also made available via the API so that it can inform automated actions taken downstream.



Accepting and rejecting general field predictions

Once general fields are enabled, the platform will automatically start predicting them within the messages throughout your dataset. For more details, check Reviewing and applying general fields. You can then accept the predictions that are correct or reject them where they are incorrect. Each of these actions sends training signals that will be used to improve the understanding of the platform of that general field.

For the pre-trained general fields that are trained offline, such as monetary quantity, URL, and so on, it is more important from an improvement perspective for you to reject or correct wrong predictions than it is for you to accept correct predictions.

For the general fields that train live in the platform, it is equally important to accept correct predictions as well as reject incorrect predictions. You do not, however, need to keep accepting many correct examples of each unique general field for these kinds if you aren't finding incorrectly predicted ones. For example, Example Bank Ltd. is a unique organization general field.

Note: The key caveat to this if that if you review any general field in a paragraph, you need to review all of the other general fields in that paragraph.

To review a general field prediction, hover the mouse over the prediction and the general field review modal will appear, as shown in the example from the following image. To accept it, select Confirm, to reject it, select Dismiss.

You can train general fields and labels independently of each other. Reviewing labels for a message does not mean you have to review the general fields in that same message. However, it is good practice to do both at the same time, as the most efficient use of your time while model training.

Important: When training general fields, make sure you consider the best practices explained in this section. The most important best practice is that you do not partially annotate paragraphs.

To understand how well the platform can predict each general field enabled for a dataset, particularly the trainable ones, check Validation for general fields.



Note: Make sure you reject incorrect general field predictions, but if the highlighted text was in fact a different general field (more common for date-related general fields), you should apply the correct one. For more details on applying general fields, check the following section.

Applying general fields

To apply a general field to some text where the platform may not have predicted it, users simply need to highlight the section of test like you would if you were going to copy it.

A dropdown menu will appear, as shown in the following image, containing all of the general fields that you have enabled for your dataset. Select the correct one to apply it, or press the corresponding keyboard shortcut.

The default keyboard shortcut for each general field is the letter it starts with. If more than one general field starts with the same letter, one will be assigned at random to the other.



Once a general field has been applied, it will be highlighted in colour with a bold outline as shown in the following image. Each general field type will have its own specific colour.



Note: A value for a given general field type cannot be split across multiple paragraphs. The full value must be contained within a paragraph for it to be extracted as one general field value.

Best practices

The following are some of the most important best practices to consider when accepting, rejecting, or applying general fields within messages:
  • Do not split words.
  • Do not partially annotate paragraphs.

Do not split words

Make sure you do not split words as the highlighted general field should cover the entire word, or several, in question, not just part of it. Check the following images for an example of an incorrect and a correct application.





Do not partially annotate paragraphs

When annotating, if a user assigns one label to a message, they should apply all the labels that could apply to that message, otherwise you teach the model that those other labels should not apply. For general fields, the same is true, except general fields are reviewed or applied at the paragraph level, rather than the whole message.

Paragraphs in a message are separated by new lines. The subject line of an email message is considered its own single paragraph.

Make sure to review or apply all of the general fields within a paragraph across all general field kinds if you review or apply one of them. Applying, accepting or rejecting general fields in a paragraph means that the paragraph is treated as reviewed by the platform from a general field perspective. Therefore, make sure to accept or reject all of the predictions in that paragraph.

The following example shows the different paragraphs that have been reviewed within the email message.



The message depicted in the following image shows the same example where the user has not accepted or rejected all of the general field predictions in a single paragraph. This is incorrect, as the model will falsely treat the monetary quantity general field as an incorrect prediction.



Validation for general fields

Introduction

The platform displays validation statistics, warnings and recommended actions for enabled general fields in the Validation page, much like it does for every label in your taxonomy.

To see these, navigate to the Validation page and select the General fields tab at the top, as shown in the image below.



How general field validation works

The process in which the platform validates its ability to correctly predict general fields is very similar to how it does it for labels.

Messages are split (80:20) into a training set and a test set (determined randomly by the message ID of each message) when they are first added to the dataset. Any general fields that have been assigned (predictions that were accepted or corrected) will fall into the training set or the test set, based on whichever set the message that they're in was assigned to originally.

As there can sometimes be a very large number of general fields in one message and no guarantee whether a message is in the training set or the test set, you may see a large disparity between the number of general fields in each set.

There may also be instances where all of the assigned general fields fall into the train set. As at least one example is required in the test set to calculate the validation scores, this general field would require more assigned examples until some were present in the test set.

Calculating scores

The individual precision and recall statistics for each general field with sufficient training data are calculated in a very similar way to that of labels:

Precision = No. of matching general fields / No. of predicted general fields

Recall = No. of matching general fields / No. of actual general fields

A 'matching general field' is where the platform has predicted the general field exactly (i.e. no partial matches)

The F1 Score is simply the harmonic mean of both precision and recall.

Trainable general fields

It's worth noting that the precision and recall stats shown in this page are most useful for the general fields that are trainable live in the platform, shown in the second column above, as all of the general fields reviewed for these general field kinds will directly impact the platform's ability to predict those general fields.

Hence accepting correct general fields and correcting or rejecting wrong general fields should be done wherever possible.

Pre-trained general fields

For general fields that are pre-trained via template field types, in order for the validation statistics to provide an accurate reflection of performance, users would need to ensure they accept a considerable amount of correct predictions, as well as correcting wrong ones.

If they were only to correct wrong predictions, the train and test sets would be artificially full of only the instances where the platform has struggled to predict a general field, and not those where it is better able to predict them. As correcting wrong predictions for these general fields does not lead to a real-time update of these general fields (they are updated periodically offline), the validation statistics may not change for some time and could be artificially low.

Accepting lots of the correct predictions may not always be convenient, as these general fields are predicted correctly far more often than not. But if the majority of the predictions are correct for these general fields, it's likely that you may not need worry about their precision and recall stats in the Validation page.

Understanding the summary statistics

The summary stats (average precision, average recall and average F1 score) are simply averages of each of the individual general field scores.

Like with labels, only general fields that have sufficient training data are included in the average scores. Those that do not have sufficient training data to be included have a warning icon next to their name.

Note: The summary stats incorporate all of the general fields with sufficient training data, both those that are trainable live and those that are pre-trained. The predictions for general fields that are pre-trained are often only corrected when they are wrong, and not always accepted when they are right. This means their precision and recall stats can often be artificially low, which would lower the average scores.

Metrics

The General fieldsValidation page shows the average general field performance statistics, as well as a chart showing the average F1 score of each general field versus their training set size. The chart also flags general fields that have amber or red performance warnings.



The general field performance statistics shown are:

  • Average F1 Score: Average of F1 scores across all general fields with sufficient data to accurately estimate performance. This score weighs recall and precision equally. A model with a high F1 score produces fewer false positives and negatives.
  • Average Precision: Average of precision scores across all general fields with sufficient data to accurately estimate performance. A model with high precision produces fewer false positives.
  • Average Recall: Average of recall scores across all general fields with sufficient data to accurately estimate performance. A model with high recall produces fewer false negatives.

Understanding general field performance

The general field performance chart shown in the Metrics tab of the Validation page gives an immediate visual indication of how each individual general field is performing. For more details, check the previous section.

For a general field to appear on this chart, it must have at least 20 pinned examples present in the training set used by the platform during validation. To ensure that this happens, users should make sure they provide a minimum of 25 (often more) pinned examples per general field from 25 different messages.

Each general field will be plotted as one of three colours, based on the model's understanding of how the general field is performing. Below, we explain what these mean:



General field performance indicators

  • Blue - Those general fields plotted as blue on the chart have a satisfactory performance level. This is based on numerous contributing factors, including number and variety of examples and average precision for that general field
  • Amber - General fields plotted as amber have slightly less than satisfactory performance. They may have relatively low average precisionornot quite enough training examples. These general fields require a bit of training / correction to improve their performance
  • Red - General fields plotted as red are poorly performing general fields. They may have very low average precision or not enough training examples. These general fields may require considerably more training / correction to bring their performance up to a satisfactory level
Note: You will see the amber and red performance indicators appear in the general field filter bars in Explore, Reports and Validation. This helps to quickly notify you which general fields need some help, and also which general fields' predictions should not be relied upon (without some work to improve them) when using the analytics features.

Individual general field performance

Users can select individual general fields from the general field filter bar (or by selecting the general field's plot on the All general fields chart) in order to see the general field's performance statistics.

The specific general field view will also show any performance warnings and recommended next best action suggestions to help improve its performance.

The general field view will show the average F1 score for the general field, as well as its precision and recall.

Example general field card with recommended actions.

Improving general field performance

Note: You must have assigned the Dataset - Review permission as an Automation Cloud user, or the Review and annotate permission as a legacy user.

Overview

Like training labels, training general fields is the process by which a user teaches the platform which general fields apply on a given message using various training modes.

Like with labels, the Teach, Check, and Missed modes are available to help train and improve the performance of general fields and can be accessed either 1) on the Explore page using the training dropdown, or 2) by following the recommended actions on the General fields tab of the Validation page.

The following image depicts the dropdown menu containing the general field training modes in Explore:



General field recommended actions

If a specific general field has a performance warning, the platform recommends the next best action that it thinks will help address that warning, listed in order of priority. This will be shown when you select a specific general field from the taxonomy or the All general field chart.

The next best actions suggestions act as links that you can select to take you direct to the training view that the platform suggests in order to improve the general field's performance. The suggestions are intelligently ordered with the highest priority action to improve the general field listed first.

This is the most important tool to help you understand the performance of your general fields, and should regularly be used as a guide when trying to improve general field performance.

Check this example of a general field card with recommended actions:



General field training modes

The following table summarises when the platform recommends each general field training mode:

Teach General fieldCheck General fieldMissed General field
  • Shows predictions for a label where the model is uncertain if it applies or not.
  • For training general fields on unreviewed messages.
  • Shows messages where the platform thinks the general field may have been misapplied.
  • For training general fields on reviewed messages to try to find and correct any inconsistencies.
  • Shows messages that the platform thinks may be missing the selected general field.
  • For training general fields on reviewed messages to try to find and correct any inconsistencies.

Using Teach General field

Using Teach General field boosts general field performance, because the model is being given new information on messages it is unsure about, as opposed to ones that it already has highly confident predictions for.



The platform recommends Teach General Fields when:

  • There is a performance warning next to a general field as shown in the following image. This occurs when the minimum of 25 examples were not provided.
  • The F1 score on a given general field is low.
  • There may not always be obvious context within the text for a general field, or there is lots of variation within the general field values for a given type.

The following image contains an example of training a general field in Teach General Fields mode:



Using Check General Fields

Using check general field helps identify inconsistencies in the reviewed set, while improving the model's understanding of the general field, by ensuring that the model has correct and consistent examples to make predictions. This will improve the recall of a general field.

The platform recommends Check General Fields when:

  • There is low recall, but high precision.
  • The predictions the platform makes are very accurate, but a lot of the time where the general field has been applied, it doesn’t catch these examples.
This is an example of training a general field in Check general fields mode:



For more details on calculations for general field validation, check Validation for general field.

Using Missed General Field

Using missed general field helps find examples in the reviewed set that should have the selected general field but do not. It will also help identify partially annotated messages which can be detrimental to the model's ability to predict a general field. This will improve the precision of a general field and ensure the model has correct and consistent examples to make predictions from.

The platform recommends Missed General Field when:

  • There is high recall, but low precision.
  • You are incorrectly predicting general fields a lot, but when you do predict them correctly, you catch many of the examples that should be there.
The following image contains an example of training a general field in Missed General Field mode:



For more details on calculations for general field validation, check the Validation for general fields page.

Building custom regex general fields

Note:
  • You must have assigned the Dataset - Manage permission as an Automation Cloud user, or the Modify datasets permission as a legacy user.
  • You can build custom Regex general fields through the Dataset settings or the Manage general fields option in the Generative Extraction field annotation experience, explained in detail, in the Generative extraction page.

Custom Regex general fields

Use custom Regex general fields to extract and format spans of text that have a known repetitive structure, such as IDs or reference numbers.

This is a useful option for simple, structured general fields with little variation. In case of general fields with significant variation and where the context has a big influence on predictions, a machine-learning based general field is the right choice. You can use combinations of the two in any dataset within Communications Mining™.

A broader Regex (i.e., set of rules to define the general field) can also be used as the base of a custom general field. This combines the rules with contextual, machine learning based refinement through training within Communications Mining to create sophisticated custom general fields. This provides the most optimal performance as well as the necessary restrictions on values extracted for automation.

Custom Regex Template

A Custom Regex general field is made up of a field type with the Regex data type, which in turn has one or more custom Regex Templates. Each template expresses one way to extract (and format) the general field.

Combined together, these templates offer a flexible and powerful way to cover multiple representations of the same general field type.

A template is made of the following:

  1. The regex (regular expression), which describes the constraints that need to be met by a span of text to be extracted as a general field.
  2. The formatting, which expresses how to normalise the extracted string into a more standard format.

For instance, if your customer IDs is either the ID word, followed by 7 digits, or an alphanumeric string of 9 characters. The following image shows what your two templates would look like:





Type-ahead validation

When typing into the text box for either the Regex or the Formatting, the interface will provide immediate feedback on the validity of the input. For instance, the invalid input Regex ID\d{} will show:

Extraction preview

The Custom Regex Template can be tested on text to ensure that it behaves as expected. Any general field that would be extracted with the Template will be shown in a list, with its value, as well as the position of the start and end characters.

For instance, if the Regex is \d{4} and the formatting ID-{$}, the following test string will show one extraction:


Regex

The regex is the pattern used to extract general fields in the text. Check the syntax documentation.

Named capture groups can be used to identify a specific section of the extracted string for subsequent formatting. The names of the capture groups should be unique across all templates, and should only contain lowercase letters or digits.

Formatting

Formatting can be provided to post-process the extracted general field.

By default, no formatting is applied and the string returned by the platform will be the string extracted by the regex. However, if needed, more complex transformations can be defined, using the following rules.

Variables

Any named capture group defined in the regex will be available to use in the formatting logic as a variable, prefixed with the $ symbol. Note that the $ symbol by itself represents the full regex match.
Variables can then be used in the formatting string to insert the corresponding extracted span into the value returned by the platform; the variable name needs to be surrounded by { and } braces.
For instance, if we want to extract seven digits as an ID, and return these seven digits prefixed with ID- then the regex and the formatting would be:


Or, using a named capture group:


Later on, if the platform is given the My identification number is 1234567 text, it will return one general field: ID-1234567

String Operations

Raw strings can be used, and strings can be concatenated using the & symbol.
Regex(?P<id1>\b\d{3}\b)|(?P<id2>\b\d{4}\b)
Formatting{$id1 & "-" & $id2}
TextThe first id is 123 and the second one is 4567
General Field returned by the platform123-4567

Functions

Some functions can also be used in the formatting to transform the extracted string. The names of the functions and their signatures are inspired by Excel.

Upper

Converts all characters in the extracted span to uppercase:

Regex\w{3}
Formatting{upper($)}
Textabc
General Field returned by the platformABC

Lower

Converts all characters in the extracted span to lowercase:

Regex\w{3}
Formatting{lower($)}
TextAbC
General Field returned by the platformabc

Proper

Capitalises the extracted span:

Regex\w+\s\w+
Formatting{proper($)}
Textalbert EINSTEIN
General Field returned by the platformAlbert Einstein

Pad

Pads the extracted span up to a given size with a given character.

Function arguments:

  1. The text containing the characters to be padded
  2. Size of the padded string
  3. Character to be used for padding
Regex\d{2,5}
Formatting{pad($, 5, "0")}
Text123
General Field returned by the platform00123

Substitute

Replaces characters with other characters.

Function arguments:

  1. The text containing the characters to be substituted
  2. What characters to replace
  3. What the old characters should be replaced with
Regexab
Formatting{substitute($, "a", "12")}
Textab
General Field returned by the platform12b

Left

Returns the first n characters from the span.

Function arguments:

  1. The text containing the characters to be extracted
  2. The number of characters to return
Regex\w{4}
Formatting{left($, 2)}
TextABCD
General Field returned by the platformAB

Right

Returns the last n characters from the span.

Function arguments:

  1. The text containing the characters to be extracted
  2. The number of characters to return
Regex\w{4}
Formatting{right($, 2)}
TextABCD
General Field returned by the platformCD

Mid

Returns n characters after the specified position from the span.

Function arguments:

  1. The text containing the characters to be extracted
  2. The position of the first character to return
  3. The number of characters to return
Regex\w{5}
Formatting{mid($, 2, 3)}
TextABCDE
General Field returned by the platformBCD

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo
Trust and Security
© 2005-2025 UiPath. All rights reserved.