Using the Train Advanced Classifier action

Use the Train Advanced Classifier action to create a model file that is used by the Classify Document, Classify Pages, or Split Document actions to sort the documents into required categories for input.

Prerequisites

Before building the bot, collect example documents and categorize them into folders. Ensure the set of example documents meets the following requirements:
  • Has at least two categories.
  • A minimum of 15 files per category is required, with a recommendation of 20 files per category.
  • There are no restrictions on the maximum number of categories. However, it is important to note that as the training data set and the corresponding model size increase, the performance of the classification process can decline. Therefore, it is advisable to keep the number of categories within a range of 150 per model file for optimal performance.
  • The supported file formats are as follows:
    • .tiff
    • .bitmap
    • .jepg
    • .png
    • .pdf
    • .txt
  • We recommend that you provide images with a resolution of 300 dpi (dots per inch). The minimum acceptable resolution is 200 dpi.
Note:

If these minimum requirements are not met, an error message is displayed during bot run-time.

Procedure

  1. In the Actions palette, double-click or drag the Train Advanced Classifier action from the Advanced Classifier package.
  2. Enter a name for the model file in the Model name field.
  3. Select the Training folder path from Desktop folder or Variable. The input folder path must have subdirectories with the names that correspond to the category of the documents that you want to train the classifier on. For example, if you have sales-related documents, the input folder path must have subfolders such as Invoice and Purchase Order.
  4. Optional: If you select Desktop file, click Browse to change the default filepath. For example, C:\Users\Dave\BankStatement\TrainingData
  5. Use the Model output path field to select the directory for the output model file.
  6. In the License field, provide a license credential.
  7. If you select Credential option, click Pick to get a license from the license locker.
  8. In the Document Split Training field:
    If you select Disabled:
    1. In Advanced Settings, choose the Classification Type from the drop-down menu based on the type of classifier you want to build:
      • Visual Classifier
      • Content Classifier
      • Visual and Content Classifier
    2. Optional: Add the Text Rules.
    If you select Enabled:
    1. You will see the following options:
      • Merge Unknown Document - Unchecked, by default.
      • Unknown Page Threshold - 30 percent, by default
      • Split Confidence Threshold (or Separation Split Threshold) - 70 percent, by default.
    2. In Advanced Settings, choose the Classification Type from the drop-down menu based on the type of classifier you want to build:
      • Visual Classifier
      • Content Classifier
      • Visual and Content Classifier
    3. Optional: Add the Text Rules.
    Note: There must be only one rule file per category. If the rules file is placed outside the category folders shows the following error message:

    Invalid rule file location

  9. Click Save and Run.

Next steps

After creating the model, build a bot to classify input documents. For more information, see Using the Classify Document action.