- Introduction
- Setting up your account
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields
- Labels (predictions, confidence levels, label hierarchy, and label sentiment)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Annotated and unannotated messages
- Extraction Fields
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Access Control and Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Creating or deleting a data source in the GUI
- Uploading a CSV file into a source
- Preparing data for .CSV upload
- Creating a dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amending dataset settings
- Deleting a message
- Deleting a dataset
- Exporting a dataset
- Using Exchange integrations
- Model training and maintenance
- Understanding labels, general fields, and metadata
- Label hierarchy and best practices
- Comparing analytics and automation use cases
- Turning your objectives into labels
- Overview of the model training process
- Generative Annotation
- Dastaset status
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Training chat and calls data
- Understanding data requirements
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and Recall
- How validation works
- Understanding and improving model performance
- Reasons for label low average precision
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining™
- Developer
- Exchange Integration with Azure service user
- Exchange Integration with Azure Application Authentication
- Exchange Integration with Azure Application Authentication and Graph
- Fetching data for Tableau with Python
- Elasticsearch integration
- Self-hosted Exchange integration
- UiPath® Automation Framework
- UiPath® Marketplace activities
- UiPath® official activities
- How machines learn to understand words: a guide to embeddings in NLP
- Prompt-based learning with Transformers
- Efficient Transformers II: knowledge distillation & fine-tuning
- Efficient Transformers I: attention mechanisms
- Deep hierarchical unsupervised intent modelling: getting value without training data
- Fixing annotating bias with Communications Mining™
- Active learning: better ML models in less time
- It's all in the numbers - assessing model performance with metrics
- Why model validation is important
- Comparing Communications Mining™ and Google AutoML for conversational data intelligence
- Licensing
- FAQs and more

Communications Mining user guide
Using general fields
A guide to setting up and training General Fields in the platform.
It is important to define the key data points (i.e. fields) that you want to extract from your Communications Mining™ data. These typically facilitate downstream automation, but can also be useful for analytics - particularly in assessing the potential success rate and benefit of automation opportunities.
- General fields are fields that you may want to extract, that can be found across multiple different topics/labels in a dataset.
- Extraction fields are the fields conditioned and created on a specific label. In other words, it is tied to a specific label that you want to automate.
Check out the official documentation, to find out more about the Generative extraction and General vs extraction fields. If Generative Extraction is not available in your region, continue to use general fields as normal. The rest of this section provides guidance on how to use general fields.
Ultimately, general field predictions, combined with labels, can facilitate automation by providing the structured data points needed to complete a specific task or process. It’s much more time-efficient to train general fields in your dataset in conjunction with labels, rather than focusing on one and then the other (i.e., training general fields after training a full taxonomy of labels).
For more details, check Generative extraction and General fields and extraction fields. If Generative Extraction is not available in your region, continue to use general fields as normal. The rest of this section provides guidance on how to use general fields.
General fields are additional elements of structured data which can be extracted from within the messages in your dataset. General fields include data points such as monetary quantities, dates, currency codes, email addresses, URLs, as well as many other industry specific categories.
The platform can predict most general fields, except the ones trained from scratch, as soon as they are enabled, as it can identify them based on their typical, or in some instances very specific, format and a training set of similar general fields.
Similar to labels, you are able to accept or reject general fields that are correctly or incorrectly predicted, enhancing the ability of the model to identify them in future.
General fields can be of the following types:
- Pre-trained general fields, which are based on a set of standard or custom-defined rules, for example, monetary quantity, URL, and date.
- General fields trained from scratch, which are based on machine learning. You can train these fields as you would train labels.
Trainable general fields
- can be trainable by nature when trained from scratch.
- can be made trainable when enabled, which are all other kinds of general fields.
Trainable general fields are those that will update live in the platform based on training that users provided. For more details on training general fields, checkReviewing and applying general fields.
If you enable training on a pre-trained general field that is based on a set of standard or custom-defined rules, you can refine the understanding of the platform of that general field within the parameters of those rules. Essentially, further training on these will reduce the scope of what the platform can consider that general field, but not increase it.
This is because many of these general fields, like dates, such as tomorrow, and monetary quantities, such as £20, need to be normalized into a structured data format for downstream systems. Also for general fields, such as ISINs or CUSIPs, these must have a set format, so the platform should not be taught to predict anything that does not conform to their defined formats.
When any trainable general fields are assigned, the platform looks at both the text of the general field, as well as the context of the general field within the rest of the communication, that is, what happens before and after the general field value in the same paragraph, and the one surrounding it. It learns to better predict the general field based on the values themselves, as well as how the value appears within the context of the communication.
Non-trainable general fields
If a pre-trained general field is not set as trainable, you can still accept or reject the general field predictions you identify in your dataset. These are updated and refined offline using this in-platform user feedback.
It is helpful for you to accept or reject these general fields when reviewing messages.
To learn more on how to enable general fields on a dataset, check the Enabling, disabling, updating, and creating general fields page.
When configuring general field types, you can select from one of the following pre-built options, through the template option when selecting the data type for the field type:
General field type | Description |
---|---|
An email address. | |
Currency | A currency code, such as GBP, CHF, or USD. |
URL | A uniform resource locator, that is, web address. |
SEDOL | A financial security identifier, short for Stock Exchange Daily Official List, which is 7 characters in length. |
BIC Code | A Business Identifier Code (BIC) is an international standard under ISO 9362 for routing business transactions and identifying business parties. The BIC code is 8 or 11 characters in length. |
LEI | A Legal Entity Identifier (LEI) is a unique global identifier of legal entities participating in financial transactions. LEI is formatted as a 20-character alpha-numeric code. |
ISIN | An International Securities Identification Number (ISIN) uniquely identifies a financial security. ISIN is a 12-character alpha-numeric code. |
Mark-to-market (MTM or M2M) | Mark-to-market refers to the fair value of an asset or liability. Mark-to-market is based on the current market price, the price of similar assets and liabilities, or on another objectively obsessed fair value. |
CUSIP | A CUSIP is a 9-digit number or a 9-character alpha-numeric code that identifies a North American financial security for the purposes of facilitating clearing and settlement of trades. |
- You must have assigned the Source - Read and Dataset - Manage, permissions as an Automation Cloud user, or the View sources and Modify Datasets, or Datasets Admin, permissions as a legacy user.
- You have a default quota of 25 general fields per dataset. If you need more than 25 general fields, request a quota increase through the account team.
To enable general fields on a new dataset that you want to create, select them during the setup process.
+
button in the box shown below and you will be presented with a drop-down menu of all of the general fields that you are able
to enable for that dataset. Select all of the general fields you want to enable before creating the dataset. If you add any
in error, you can select the X icon next to the general field name to remove it.
To understand more about how to create a new dataset, check Create a new dataset.
If you want to enable, update or disable general fields for an existing dataset, you can do so from the settings tab on the top navigation bar, and then selecting the Labels and extraction fields tab.
To enable existing general fields, select inside the General Fields box, and select the general fields you want to enable from the drop down menu. Once you're happy with your selections, select Update General Fields (as shown below).
These general fields will have their settings pre-selected for you. You can then update them, including making them trainable, as shown below.
To update an enabled general field, select the general field in the general field box as shown in the previous images and the Edit general field modal will appear as shown in the following image.
Here you can update the base general field, the title of the general field and the API name (these concepts are described in detail below), as well as making the general field 'trainable'.
If you have previously reviewed general fields for a general field kind that was not set to 'trainable', this information is still stored.
To remove any selected general fields, simply select the 'X' icon next to the general field name, and then select Update General Fields.
If you remove a general field and select Update General Fields, this will also remove the training data for that general field for this dataset. If you chose to re-enable the general field, you will need to train it again.
If you make a mistake while updating the general fields, select 'Reset' before you select Update General fields and your changes will not be applied.
The previous sections covered how to enable and update existing pre-trained general fields for both new and existing datasets. In each instance, for either a new or existing dataset, you can also create new general fields.
Newly created general fields can be based on an existing pre-trained general field or can be trained from scratch like a new label.
+
icon in the general field box, either in the Create dataset flow or in the dataset settings page as previously shown.
This will bring up the Add a new general field modal as shown below.
Here you can set the field types, title, and API name, as well as select whether the general field is trainable or not. These can be updated later as previously shown.
When you've filled in each of the fields (explained below), simply select 'Create'.
- This will serve as the initial state for your new general field, and the dropdown will contain a list of all the pre-trained
general fields available to you
- For example, if you select 'Date' as your base general field, all of the general fields predicted for this kind will be dates, and you could then train the platform to only recognise specific dates
-
If you want to train a general field entirely from scratch, you can select 'None - Train from scratch', and then you essentially start with a blank canvas when training the general field. The platform's predictions for this general field will be entirely based on the training examples you provide
- The general field title is the name of the general field that will appear in the UI of the platform
API name
- The API name of the general field is what will be returned via the API when it provides predictions for messages
- The API name cannot contain any spaces or punctuation except for dash ( - ) and underscore ( _ )
=======================================================================
To enable general fields on a new dataset that you want to create, select them during the setup process.
+
button in the box shown below and you will be presented with a drop-down menu of all of the general fields that you are able
to enable for that dataset. Select all of the general fields you want to enable before creating the dataset. If you add any
in error, you can select the X icon next to the general field name to remove it.
To understand more about how to create a new dataset, check Creating a new dataset.
Similar to labels, you can filter messages by whether they have general fields predicted or assigned, both in Explore and Reports.
You can apply any combination of the AND, ANY OF, and NOT filters when applying more than one general field filter. These filters can give you much greater flexibility when training and interpreting your data, and can provide much deeper insights on what is happening in your communication channels.
What you can do when filtering by general field predictions:
- Apply multiple general field filters at once, in both Explore and Reports
- Filter to messages that have one of the number of selected general field predicted. For example, ANY OF the General field X AND General field Y AND so on.
- Filter to messages that have multiple different general fields predicted. For example, general field X AND general field Y AND so on.
- Filter to messages that do not have certain general fields predicted. For example, NOT General field Y.
- Search for general fields containing specific search terms, while having general field filters applied.
All of the general fields you have enabled on your dataset will appear as shown below in the filter bar. Assigning general fields is covered in detail in the Reviewing and applying general fields.
There are now two ways to apply general field filters, and you can use them in combination with each other to create the right type of query.
The default state is the one where no filter is applied and all messages are shown, unless another filter is applied.
To update the general field filter, use the buttons explained in the following table, which also change colour when selected:
Show messages containing any annotated general fields. | |
Show messages predicted to contain a general field |
If you want to filter to messages that have any annotated general fields or predicted to contain a general field, use the buttons at the top as shown in the previous table. If you want to filter to messages with specific annotated or predicted general fields, hover over the general field in question and the same two buttons will appear to the right.
If you want to filter to either an assigned or predicted general field, select the name of the general field, and it shows messages with either one of them.
To remove your selection, select the button again, and to remove multiple selections, select All. You can also select Clear All from the the filter bar, but this will clear every filter you have selected, not just general field filters.
The taxonomy of general fields functions as a normal filter bar, and allows you to select multiple general fields at once with a single select for each.
Selecting multiple general fields from the list creates an ANY OF type query.
If you selected General field A, General field B, and General field C in the General field bar, this creates a Show me messages with General field A, General field B, or General field C predicted query.
When filtering to specific general fields, you can make multiple selections. For instance, you could filter to see messages that have an address line general field assigned OR a city general field predicted as shown in the following image.
The second filter option is the + Add General field filter button.
This enables a dropdown general field bar that allows you to select more complex filters, such as excluding certain general fields from consideration.
From this dropdown, you can select multiple general fields to include or exclude by selecting the name of the general field (for assigned and predicted), or the individual buttons (including minus for where this general field is neither assigned nor predicted).
The result looks like in this example, which returns messages predicted to have the Invoice ID general field, but not the Prod ID general field assigned or predicted:
You can select + Add General field Filter, multiple times to add additional layers to your query. Two separate general field filters create an AND type query, whilst multiple general fields selected in the same general field filter create an ANY OF type query.
In the example below, multiple general field filters have been applied individually. This creates a filter that will return messages predicted to have any of the three general fields in the first filter, but that also have the Policy Number general field predicted, and do not have the UK Postcode general field predicted or assigned.
A helpful tip is that by selecting the & sign in an individual filter containing multiple general fields, you can automatically split them out into individual filters. This would change the query from ANY OF, that is, any of these general fields predicted to AND, that is, all of these general fields predicted.
You can combine filters from both the general field bar, and individually added general field filters. Filters applied in the general field bar are treated as an AND query with any individually applied general field filters.
For example, in the image below, this combined query would return any messages that had either ORDER ID or PROD ID predicted.
Combine general field filter using general field bar and individually added general field filters.
What these filters also mean, is that you can now apply general field filters and sort by a specific general field for a training mode.
Predicted general fields appear as colour-highlighted text, such as in the first line of the message depicted in the following image, with a different colour appearing for each different general field type. Once you confirm a general field, either by manually applying it or accepting a prediction, the general field will appear as highlighted text with a bold, darker outline as shown in the following image.
If a paragraph has had general fields assigned, dismissed, or applied, it will appear highlighted in grey, as shown in the body of the message from the following image.
When reviewing trainable general fields, that the platform will learn from both the general field values that you assign, as well as the context of where they appear within the communications, that is, the other language used around the values themselves.
The platform will consider the context of the language in the same paragraph as the general field value, as well as the single paragraphs, denoted by a new separated line, directly before and after the paragraph that the general field sits in.
When the platform predicts which general fields apply to a communication, it assigns each prediction a confidence score (%) to show how confident it is that the general field applies to the highlighted span of text. You can view a general field’s confidence score by hovering over the general field.
This confidence score is also made available via the API so that it can inform automated actions taken downstream.
Once general fields are enabled, the platform will automatically start predicting them within the messages throughout your dataset. For more details, check Reviewing and applying general fields. You can then accept the predictions that are correct or reject them where they are incorrect. Each of these actions sends training signals that will be used to improve the understanding of the platform of that general field.
For the pre-trained general fields that are trained offline, such as monetary quantity, URL, and so on, it is more important from an improvement perspective for you to reject or correct wrong predictions than it is for you to accept correct predictions.
For the general fields that train live in the platform, it is equally important to accept correct predictions as well as reject incorrect predictions. You do not, however, need to keep accepting many correct examples of each unique general field for these kinds if you aren't finding incorrectly predicted ones. For example, Example Bank Ltd. is a unique organization general field.
To review a general field prediction, hover the mouse over the prediction and the general field review modal will appear, as shown in the example from the following image. To accept it, select Confirm, to reject it, select Dismiss.
You can train general fields and labels independently of each other. Reviewing labels for a message does not mean you have to review the general fields in that same message. However, it is good practice to do both at the same time, as the most efficient use of your time while model training.
To understand how well the platform can predict each general field enabled for a dataset, particularly the trainable ones, check Validation for general fields.
To apply a general field to some text where the platform may not have predicted it, users simply need to highlight the section of test like you would if you were going to copy it.
A dropdown menu will appear, as shown in the following image, containing all of the general fields that you have enabled for your dataset. Select the correct one to apply it, or press the corresponding keyboard shortcut.
The default keyboard shortcut for each general field is the letter it starts with. If more than one general field starts with the same letter, one will be assigned at random to the other.
Once a general field has been applied, it will be highlighted in colour with a bold outline as shown in the following image. Each general field type will have its own specific colour.
- Do not split words.
- Do not partially annotate paragraphs.
Do not split words
Make sure you do not split words as the highlighted general field should cover the entire word, or several, in question, not just part of it. Check the following images for an example of an incorrect and a correct application.
Do not partially annotate paragraphs
When annotating, if a user assigns one label to a message, they should apply all the labels that could apply to that message, otherwise you teach the model that those other labels should not apply. For general fields, the same is true, except general fields are reviewed or applied at the paragraph level, rather than the whole message.
Paragraphs in a message are separated by new lines. The subject line of an email message is considered its own single paragraph.
Make sure to review or apply all of the general fields within a paragraph across all general field kinds if you review or apply one of them. Applying, accepting or rejecting general fields in a paragraph means that the paragraph is treated as reviewed by the platform from a general field perspective. Therefore, make sure to accept or reject all of the predictions in that paragraph.
The following example shows the different paragraphs that have been reviewed within the email message.
The message depicted in the following image shows the same example where the user has not accepted or rejected all of the general field predictions in a single paragraph. This is incorrect, as the model will falsely treat the monetary quantity general field as an incorrect prediction.
The platform displays validation statistics, warnings and recommended actions for enabled general fields in the Validation page, much like it does for every label in your taxonomy.
To see these, navigate to the Validation page and select the General fields tab at the top, as shown in the image below.
The process in which the platform validates its ability to correctly predict general fields is very similar to how it does it for labels.
Messages are split (80:20) into a training set and a test set (determined randomly by the message ID of each message) when they are first added to the dataset. Any general fields that have been assigned (predictions that were accepted or corrected) will fall into the training set or the test set, based on whichever set the message that they're in was assigned to originally.
As there can sometimes be a very large number of general fields in one message and no guarantee whether a message is in the training set or the test set, you may see a large disparity between the number of general fields in each set.
There may also be instances where all of the assigned general fields fall into the train set. As at least one example is required in the test set to calculate the validation scores, this general field would require more assigned examples until some were present in the test set.
The individual precision and recall statistics for each general field with sufficient training data are calculated in a very similar way to that of labels:
Precision = No. of matching general fields / No. of predicted general fields
Recall = No. of matching general fields / No. of actual general fields
A 'matching general field' is where the platform has predicted the general field exactly (i.e. no partial matches)
The F1 Score is simply the harmonic mean of both precision and recall.
It's worth noting that the precision and recall stats shown in this page are most useful for the general fields that are trainable live in the platform, shown in the second column above, as all of the general fields reviewed for these general field kinds will directly impact the platform's ability to predict those general fields.
Hence accepting correct general fields and correcting or rejecting wrong general fields should be done wherever possible.
For general fields that are pre-trained via template field types, in order for the validation statistics to provide an accurate reflection of performance, users would need to ensure they accept a considerable amount of correct predictions, as well as correcting wrong ones.
If they were only to correct wrong predictions, the train and test sets would be artificially full of only the instances where the platform has struggled to predict a general field, and not those where it is better able to predict them. As correcting wrong predictions for these general fields does not lead to a real-time update of these general fields (they are updated periodically offline), the validation statistics may not change for some time and could be artificially low.
Accepting lots of the correct predictions may not always be convenient, as these general fields are predicted correctly far more often than not. But if the majority of the predictions are correct for these general fields, it's likely that you may not need worry about their precision and recall stats in the Validation page.
The summary stats (average precision, average recall and average F1 score) are simply averages of each of the individual general field scores.
Like with labels, only general fields that have sufficient training data are included in the average scores. Those that do not have sufficient training data to be included have a warning icon next to their name.
The General fieldsValidation page shows the average general field performance statistics, as well as a chart showing the average F1 score of each general field versus their training set size. The chart also flags general fields that have amber or red performance warnings.
The general field performance statistics shown are:
- Average F1 Score: Average of F1 scores across all general fields with sufficient data to accurately estimate performance. This score weighs recall and precision equally. A model with a high F1 score produces fewer false positives and negatives.
- Average Precision: Average of precision scores across all general fields with sufficient data to accurately estimate performance. A model with high precision produces fewer false positives.
- Average Recall: Average of recall scores across all general fields with sufficient data to accurately estimate performance. A model with high recall produces fewer false negatives.
The general field performance chart shown in the Metrics tab of the Validation page gives an immediate visual indication of how each individual general field is performing. For more details, check the previous section.
For a general field to appear on this chart, it must have at least 20 pinned examples present in the training set used by the platform during validation. To ensure that this happens, users should make sure they provide a minimum of 25 (often more) pinned examples per general field from 25 different messages.
Each general field will be plotted as one of three colours, based on the model's understanding of how the general field is performing. Below, we explain what these mean:
General field performance indicators
- Blue - Those general fields plotted as blue on the chart have a satisfactory performance level. This is based on numerous contributing factors, including number and variety of examples and average precision for that general field
- Amber - General fields plotted as amber have slightly less than satisfactory performance. They may have relatively low average precisionornot quite enough training examples. These general fields require a bit of training / correction to improve their performance
- Red - General fields plotted as red are poorly performing general fields. They may have very low average precision or not enough training examples. These general fields may require considerably more training / correction to bring their performance up to a satisfactory level
Users can select individual general fields from the general field filter bar (or by selecting the general field's plot on the All general fields chart) in order to see the general field's performance statistics.
The specific general field view will also show any performance warnings and recommended next best action suggestions to help improve its performance.
The general field view will show the average F1 score for the general field, as well as its precision and recall.
Like training labels, training general fields is the process by which a user teaches the platform which general fields apply on a given message using various training modes.
Like with labels, the Teach, Check, and Missed modes are available to help train and improve the performance of general fields and can be accessed either 1) on the Explore page using the training dropdown, or 2) by following the recommended actions on the General fields tab of the Validation page.
If a specific general field has a performance warning, the platform recommends the next best action that it thinks will help address that warning, listed in order of priority. This will be shown when you select a specific general field from the taxonomy or the All general field chart.
The next best actions suggestions act as links that you can select to take you direct to the training view that the platform suggests in order to improve the general field's performance. The suggestions are intelligently ordered with the highest priority action to improve the general field listed first.
This is the most important tool to help you understand the performance of your general fields, and should regularly be used as a guide when trying to improve general field performance.
The following table summarises when the platform recommends each general field training mode:
Teach General field | Check General field | Missed General field |
|
|
|
Using Teach General field boosts general field performance, because the model is being given new information on messages it is unsure about, as opposed to ones that it already has highly confident predictions for.
The platform recommends Teach General Fields when:
- There is a performance warning next to a general field as shown in the following image. This occurs when the minimum of 25 examples were not provided.
- The F1 score on a given general field is low.
- There may not always be obvious context within the text for a general field, or there is lots of variation within the general field values for a given type.
The following image contains an example of training a general field in Teach General Fields mode:
Using check general field helps identify inconsistencies in the reviewed set, while improving the model's understanding of the general field, by ensuring that the model has correct and consistent examples to make predictions. This will improve the recall of a general field.
The platform recommends Check General Fields when:
- There is low recall, but high precision.
- The predictions the platform makes are very accurate, but a lot of the time where the general field has been applied, it doesn’t catch these examples.
For more details on calculations for general field validation, check Validation for general field.
Using missed general field helps find examples in the reviewed set that should have the selected general field but do not. It will also help identify partially annotated messages which can be detrimental to the model's ability to predict a general field. This will improve the precision of a general field and ensure the model has correct and consistent examples to make predictions from.
The platform recommends Missed General Field when:
- There is high recall, but low precision.
- You are incorrectly predicting general fields a lot, but when you do predict them correctly, you catch many of the examples that should be there.
For more details on calculations for general field validation, check the Validation for general fields page.
- You must have assigned the Dataset - Manage permission as an Automation Cloud user, or the Modify datasets permission as a legacy user.
- You can build custom Regex general fields through the Dataset settings or the Manage general fields option in the Generative Extraction field annotation experience, explained in detail, in the Generative extraction page.
Use custom Regex general fields to extract and format spans of text that have a known repetitive structure, such as IDs or reference numbers.
This is a useful option for simple, structured general fields with little variation. In case of general fields with significant variation and where the context has a big influence on predictions, a machine-learning based general field is the right choice. You can use combinations of the two in any dataset within Communications Mining™.
A broader Regex (i.e., set of rules to define the general field) can also be used as the base of a custom general field. This combines the rules with contextual, machine learning based refinement through training within Communications Mining to create sophisticated custom general fields. This provides the most optimal performance as well as the necessary restrictions on values extracted for automation.
A Custom Regex general field is made up of a field type with the Regex data type, which in turn has one or more custom Regex Templates. Each template expresses one way to extract (and format) the general field.
Combined together, these templates offer a flexible and powerful way to cover multiple representations of the same general field type.
A template is made of the following:
- The regex (regular expression), which describes the constraints that need to be met by a span of text to be extracted as a general field.
- The formatting, which expresses how to normalise the extracted string into a more standard format.
For instance, if your customer IDs is either the ID word, followed by 7 digits, or an alphanumeric string of 9 characters. The following image shows what your two templates would look like:
ID\
d{}
will show:
The Custom Regex Template can be tested on text to ensure that it behaves as expected. Any general field that would be extracted with the Template will be shown in a list, with its value, as well as the position of the start and end characters.
\d{4}
and the formatting ID-{$}
, the following test string will show one extraction:
The regex is the pattern used to extract general fields in the text. Check the syntax documentation.
Named capture groups can be used to identify a specific section of the extracted string for subsequent formatting. The names of the capture groups should be unique across all templates, and should only contain lowercase letters or digits.
Formatting can be provided to post-process the extracted general field.
By default, no formatting is applied and the string returned by the platform will be the string extracted by the regex. However, if needed, more complex transformations can be defined, using the following rules.
$
symbol. Note that the $
symbol by itself represents the full regex match.
{
and }
braces.
ID-
then the regex and the formatting would be:
ID-1234567
&
symbol.
Regex | (?P<id1>\b\d{3}\b)|(?P<id2>\b\d{4}\b) |
Formatting | {$id1 & "-" & $id2} |
Text | The first id is 123 and the second one is 4567 |
General Field returned by the platform | 123-4567 |
Some functions can also be used in the formatting to transform the extracted string. The names of the functions and their signatures are inspired by Excel.
Converts all characters in the extracted span to uppercase:
Regex | \w{3} |
Formatting | {upper($)} |
Text | abc |
General Field returned by the platform | ABC |
Converts all characters in the extracted span to lowercase:
Regex | \w{3} |
Formatting | {lower($)} |
Text | AbC |
General Field returned by the platform | abc |
Capitalises the extracted span:
Regex | \w+\s\w+ |
Formatting | {proper($)} |
Text | albert EINSTEIN |
General Field returned by the platform | Albert Einstein |
Pads the extracted span up to a given size with a given character.
Function arguments:
- The text containing the characters to be padded
- Size of the padded string
- Character to be used for padding
Regex | \d{2,5} |
Formatting | {pad($, 5, "0")} |
Text | 123 |
General Field returned by the platform | 00123 |
Replaces characters with other characters.
Function arguments:
- The text containing the characters to be substituted
- What characters to replace
- What the old characters should be replaced with
Regex | ab |
Formatting | {substitute($, "a", "12")} |
Text | ab |
General Field returned by the platform | 12b |
Returns the first n characters from the span.
Function arguments:
- The text containing the characters to be extracted
- The number of characters to return
Regex | \w{4} |
Formatting | {left($, 2)} |
Text | ABCD |
General Field returned by the platform | AB |
Returns the last n characters from the span.
Function arguments:
- The text containing the characters to be extracted
- The number of characters to return
Regex | \w{4} |
Formatting | {right($, 2)} |
Text | ABCD |
General Field returned by the platform | CD |
Returns n characters after the specified position from the span.
Function arguments:
- The text containing the characters to be extracted
- The position of the first character to return
- The number of characters to return
Regex | \w{5} |
Formatting | {mid($, 2, 3)} |
Text | ABCDE |
General Field returned by the platform | BCD |
- Defining and setting up your fields
- Understanding general fields
- Types of general fields
- Trainable and non-trainable general fields
- Pre-built templates for general fields
- Standard template field types for general fields
- Managing general fields
- Enabling general fields on a new dataset
- Enabling, updating, and disabling general fields on an existing dataset
- Enabling general fields
- Updating general fields
- Disabling general fields
- Creating new general fields
- Field types
- General field title
- Enabling general fields on a new dataset
- Managing general fields on an existing dataset
- General field filtering
- Applying advanced prediction filters
- The General field bar
- Add general field filter
- Combining general field bar filters and added general field filters
- Combining general field filters and sorting by general field for training
- Reviewing and applying general fields
- Identifying general field predictions
- Making general field predictions for trainable general fields
- General field confidence scores
- Accepting and rejecting general field predictions
- Applying general fields
- Best practices
- Validation for general fields
- Introduction
- How general field validation works
- Calculating scores
- Trainable general fields
- Pre-trained general fields
- Understanding the summary statistics
- Metrics
- Understanding general field performance
- Individual general field performance
- Improving general field performance
- Overview
- General field recommended actions
- General field training modes
- Using Teach General field
- Using Check General Fields
- Using Missed General Field
- Building custom regex general fields
- Custom Regex general fields
- Custom Regex Template
- Type-ahead validation
- Extraction preview
- Regex
- Formatting
- Variables
- String Operations
- Functions
- Upper
- Lower
- Proper
- Pad
- Substitute
- Left
- Right
- Mid