Using Classify action

IQ Bot Classify action groups the pages of an input document based on the model file that was created using IQ Bot Train Classifier action.

Prerequisites

Build a bot with the Classify action within a Loop action to iteratively classify each file in the selected folder.

Procedure

  1. In the Actions palette, double-click or drag the Loop action from the Loop package.
  2. In the Loop Type field, select the Iterator option.
  3. In the Iterator field, select For each file in folder from the drop-down list.
  4. In the Folder path field, select the path to the folder that contains the input files.
  5. In the Assign file name and extension to this variable field, create or select a dictionary variable to store the names and extensions of the files in the selected folder path.
    For this example, we will use a dictionary variable named dictFile.
  6. In the Actions palette, double-click or drag the Classify action from the Document Classifier package.
  7. In the Input file field, enter a dynamic file path using a variable.
    1. Add a file path pointing to the folder, for example C:\input\.
    2. Add the dynamic file name string: $dictFile(name)$.$dictFile(extension)$.
      Note: Be sure to include a period between the variable holding the file name and the one holding the extension.
    The name and extension keys are predefined. When inserted and run in a loop, the action iterates through the entire folder and calls the files in the folder one at a time. The Input file value looks like this: C:\input\$dictFile(name)$.$dictFile(extension)$
  8. In the Classifier field, provide the file path to the model file.
    You can either select the .zip folder or extract the .icmf file from this folder and select it.
    Note: For better classification results and performance, we recommend that you use the .icmf file available in the .zip folder obtained from the Train Classifier action.
  9. Use the Output folder path option to save the classification output document.
  10. Optional: Configure the following ADVANCED SETTINGS:
    • Confidence threshold (%): If the confidence value of the category prediction of a page is less than the confidence threshold, it is moved to the Unclassified folder.
    • Save classification output variable: Save the classification results as a list of dictionaries with the following keys:
      • fileName
      • pageIndex
      • category
      • confidence
    Note:
    • You can select the type of classification in the Document Classifier:
      • Image-based classification
      • Text-based classification
      • Both image and text-based classification
    • To project a higher confidence threshold, we suggest that you calculate the confidence threshold when the document pages are similar. To determine the required confidence threshold, you can review the confidence values from the classification output.
    • The Document Classifier can auto-detect the language for classification, and supports all languages supported by ABBYY (an optical character recognition application).
  11. Click Save and Run.
    The pages from the output document are saved in the respective subfolders, based on the categories created in the model file. Any previously-classified documents in the output folder will be overwritten.

Next steps

You can use each subfolder of similar documents to create and train a learning instance to extract data from the documents. See Create a learning instance.