Use the Train Classifier
action to create a model file that is used by the
Classify
action to sort the documents into required categories for input.
Prerequisites
Before building the bot, collect example documents and categorize them
into folders. Ensure the set of example documents meets the following
requirements:
If these minimum requirements are not met, an error message
is displayed during bot runtime.
Each folder has a selection of documents that are a sample of the documents that the
associated learning instance will process. The
Train
Classifier
action will read through the files in the folders, and build a model
based on the documents stored inside each folder.
Note: As ABBYY FineReader Engine
OCR is now downgraded to version 12.2 from version 12.4,
older .icmf files cannot be used to retrain models in Automation 360 v.24 of the Document Classifier
package. If you want to add more categories or more files into
your existing categories, you must create a new model.
Procedure
-
In the
Actions
palette, double-click or drag the Train Classifier
action from the Document Classifier
package.
-
Click Train to continue creating a new model file.
- Optional:
If you have an existing model file, click
Re-Train.
-
Use the Training folder path field to select an
existing folder path from the Desktop folder
tab.
Alternatively, click the Variable tab to
manually enter an existing training folder path.
-
Use the Existing zip path field to select the
filepath of the .zip folder from Control
Room file or Desktop file
tab.
Alternatively, click the Variable tab to
manually enter the path for the .zip folder.
Note: When you train documents, a
.zip folder is created, which contains
.icmf, .data and
.properties files. Ensure you upload the
entire .zip folder for retraining an existing
model
file.
-
Select the input folder path from Desktop folder or
Variable.
The input folder path must have subdirectories with the names that correspond
to the category of the documents that you want to train the classifier on.
For example, if you have sales-related documents, the input folder path must
have subfolders such as Invoice and Purchase
Order.
- Optional:
If you select Desktop file, click
Browse to change the default filepath.
-
Enter a name for the model file in the Model name
field.
-
Use the Model output path field to select the directory
for the output model file.
- Optional:
Configure the following ADVANCED SETTINGS:
-
Training Optimization: Use the drop-down menu to
select the type of training optimization.
-
Precision: select this option when you
want your training model to be precise but can miss out on few
documents.
-
Recall: select this option when you want
the training model to find all the relevant cases within a
dataset.
-
F1 score: is selected by default and the
recommended setting as it combines the training optimization of
both Precision and
Recall.
F1 score is the selected by default.
Precision and
Recall.
-
Classification Type: Use the drop-down menu to
select the features you want to include such as text, image, or
both.
Text and image is selected by default. If you
select Text or Text and
image, list of supported languages is displayed in
the Recognition Language drop-down menu.
-
OCR Settings: The Extract all text
blocks and Extract text from
images are enabled by default.
With the OCR Settings enabled by default, more
time is consumed by OCR in extracting the
content. This ensures that relatively lower quality documents are
also handled based on the inputs from OCR.
-
Click Save and Run.
When you retrain an existing model,
you fetch the already trained data and combine it with new data generated from
the text or layout features from input documents. After this, you must train the
machine learning model from scratch. This method allows you to save the time
needed to re-generate text data or layout data for already trained documents.
However, the computationally expensive part is training the machine learning
model, hence re-train method is expected to be time-consuming. In case this
becomes a constraint, we recommend that you create additional model files and
use them for additional training and classification.
The model is created as a .icmf file in the
directory specified in the Model output path
field.
Next steps
After creating the model, build a bot to classify input documents. See
Using Classify action