- Overview
- Document Processing Contracts
- Release notes
- About the Document Processing Contracts
- Box Class
- IPersistedActivity interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding
- Release notes
- About the Document Understanding activity package
- Project compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Extract Document Data
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Classify Document
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- Release notes
- About the IntelligentOCR activity package
- Project compatibility
- Configuring Authentication
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Document Understanding Project Classifier
- Intelligent Keyword Classifier
- Create Document Classification Action
- Wait For Document Classification Action And Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- Document Understanding Project Extractor
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait For Document Validation Action And Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release notes
- About the OCR Contracts
- Project compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded
Machine Learning Extractor Trainer
UiPath.DocumentUnderstanding.ML.Activities.MachineLearningExtractorTrainer
Enables the collection of data that has been processed through Validation Station so that it can be imported into Document Manager. This activity can be used only within the Train Extractors Scope activity.
Designer panel
Local Storage
- Output Folder - The directory where the collected data is stored. Once the data is stored, it can be imported into machine learning training tools.
Select Private Dataset for Project
- Dataset - The dataset where the training data can be uploaded. If the robot is connected to a tenant which has AI Center enabled, you can see all the datasets from AI Center in the dropdown menu and select the folder where to upload the validated documents using the dropdown menu.
- Project - The project where the training data can be uploaded.
Note: Project and dataset selection are enabled only when connected to Orchestrator. Visit Managing datasets for more information about Public/Private Datasets.
Provide Public Dataset Endpoint
- Dataset ApiKey - The authentication key of the dataset.
- Dataset Endpoint - The endpoint of the dataset where training data can be uploaded. Once a dataset is public, it can be accessed outside UiPath® environment through an endpoint and using API key. Do this if you want to upload datasets to an AI Center instance that you're not connected to (for example in the case of hybrid deployments where the AI Center is on Cloud and the robot is connected to an On premises tenant).
Properties panel
Common
- DisplayName - The display name of the activity.
Local Storage
- Output Folder - The directory where the collected data is stored. Once the data is stored, it can be imported into machine learning training tools.
Misc
- Private - If selected, the values of variables and arguments are no longer logged at Verbose level.
Provide Public Dataset Endpoint
- Dataset ApiKey - The authentication key of the dataset.
- Dataset Endpoint - The endpoint of the dataset where training data can be uploaded. Once a dataset is public, it can be accessed outside UiPath® environment through an endpoint and using API key. Do this if you want to upload datasets to an AI Center instance that you're not connected to (for example in the case of hybrid deployments where the AI Center is on Cloud and the robot is connected to an On premises tenant).
Select Private Dataset for Project
- Dataset - The dataset where the training data can be uploaded. If the robot is connected to a tenant which has AI Center enabled, you can see all the datasets from AI Center in the dropdown menu and select the folder where to upload the validated documents using the dropdown menu.
-
Project - The project where the training data can be uploaded.
Note: Project and dataset selection are enabled only when connected to Orchestrator. Visit Managing datasets for more information about Public/Private Datasets.
Server
- RetryOnFailure - Retry on transient failure. This field only supports Boolean values (True, False). The default value is True.
- Timeout (milliseconds) - Specifies the amount of time (in milliseconds) to wait for a response from the server before an error is thrown. The default value is 100000 milliseconds (100 seconds).
The Machine Learning Extractor Trainer collects the human feedback for you, in a directory of your choice. Once you collect data and you want to retrain an ML Model, you can just zip the content of the directory and upload it in Document Manager for gathering and filtering data.
To use the Machine Learning Extractor Trainer activity, perform the following steps:
- Use the Taxonomy Manager Wizard to define your document types and fields.
- Add a Machine Learning Extractor Trainer into a Train Extractors Scope activity.
- In the Machine Learning Extractor wizard that automatically opens, enter information for the Endpoint field. You can choose one of the public endpoints. Visit Public endpoints for more information about public endpoints.
- Select the check box for the Update activity arguments if you wish to also use the entered values as input arguments for the activity, more precisely for the Endpoint.
- Select Get Capabilities.
The wizard closes after this operation
- Enter a value for Output Folder.
- Select the Configure Extractors option in
the Train Extractors Scope.
A wizard is displayed.
Figure 1. The Configure Extractors wizard
- The Machine Learning Extractor Trainer is now ready for configuration. Expand the document type that you want to apply it for, and start selecting the fields you want to train, by selecting the checkboxes next to the appropriate fields.
- Fill in the text boxes either manually or by
selecting, from the available dropdown list, the correct data you wish to map to each
field. The dropdown list contains all fields that the Machine Learning Extractor
Trainer, using the endpoint entered in the Machine Learning Extractor wizard,
declares as extraction capability.
Note: If you select the check box but you leave the text box empty, the latter will be automatically filled in with the Document Type ID from the local taxonomy. The changes apply after saving. Should you want to avoid using a long string for the field ID, we would recommend you to manually enter a value in case you do not have access to the internal taxonomy of the extractor.
- To check if you are using the latest capabilities of the extractor, you can select the Get or refresh extractor capabilities which opens the Machine Learning Extractor wizard.
- Selecting one of the options from a dropdown list automatically confirms that field.
- To train an extractor based on its extraction result, you can set the exact alphanumeric value in the Framework Alias field previously used for an extractor.
- Select Save once all fields are configured
properly.
Important: You cannot choose the same option for two distinct fields.
Document Understanding Integration
The Machine Learning Extractor Trainer activity is part of the Document Understanding solutions. Visit the Document Understanding Guide for more information.