Activities - Generative classifier

activities

latest

false

Document Understanding activities

Generative classifier - Good practices

Good practices for classifying documents using generative models in IntelligentOCR workflows.

Generative classifier allows you to classify documents using generative models. You can find tips and tricks on how to get the best out of your workflows with generative classifier in this page.

Classifying a large number of documents

Consider you have a large number of contracts that you need to sort into different categories. To optimize this process with generative classifier, follow the good practices outlined in this page.

Optimizing your input prompts

To optimize your input prompts, provide as much context as possible. Provide a detailed description of each document type. For instance, the following text can be considered while classifying an invoice: “An invoice is a document issued by a seller to a buyer, detailing products or services provided, their quantities, and prices. It includes the seller's and buyer's details, invoice number, date, total amount due, and payment terms. Invoices are used for requesting payments and record-keeping in business transactions”

In order for the generative model to function effectively, it is necessary to provide extensive context instead of brief and vague document-style descriptions, which can result in obvious errors.

Optimizing your workflow

To optimize your workflow, start by creating a folder to move classified files to avoid redundant classification.

Create a sample set of documents before automating a larger data set. This sample set should include corrupted and password-protected PDFs to test the workflow. As a good practice, include a Try Catch actvity in the workflow to prevent failures that might occur due to corrupted or password-protected PDF files. Once the Try Catch activity is in place, the workflow can be tested on the sample set to ensure its effectiveness.

In the workflow, cache digitization results (document text & document object model) to save time when testing multiple times on the same file.

On this page

Classifying a large number of documents
Optimizing your input prompts
Optimizing your workflow

Was this page helpful?

PREVIOUSGenerative extractor - Good practices

NEXTRelease notes

Classifying a large number of documents​

Optimizing your input prompts​

Optimizing your workflow​

Was this page helpful?

Classifying a large number of documents

Optimizing your input prompts

Optimizing your workflow