IQ Bot Classify action groups the
pages of an input document based on the model file that was created using IQ Bot Train Classifier action.
Build a bot with the Classify
action within a Loop
action to iteratively classify each file in the selected folder.
Procedure
-
In the
Actions
palette, double-click or drag the
Loop
action from the Loop
package.
-
In the Loop Type field, select the
Iterator option.
-
In the Iterator field, select For each file
in folder from the drop-down list.
-
In the Folder path field, select the path to the folder
that contains the input files.
-
In the Assign file name and extension to this variable
field, create or select a dictionary variable to store the names and extensions
of the files in the selected folder path.
For this example, we will use a dictionary variable named dictFile.
-
In the
Actions
palette, double-click or drag the Classify
action from the Document Classifier
package.
-
In the Input file field, enter a dynamic file path using
a variable.
-
Add a file path pointing to the folder, for example
C:\input\.
-
Add the dynamic file name string:
$dictFile(name)$.$dictFile(extension)$.
Note: Be sure to include a period between the variable holding the file
name and the one holding the extension.
The name and extension keys are predefined. When inserted and
run in a loop, the action iterates through the entire folder and
calls the files in the folder one at a time. The Input
file value looks like this:
C:\input\$dictFile(name)$.$dictFile(extension)$
-
In the Classifier field, provide the file path to the
model file.
You can either select the
.zip
folder or extract the
.icmf file from this folder and
select it.
Note: For better classification results and
performance, we recommend that you use the .icmf
file available in the .zip folder obtained from the
Train Classifier
action.
-
Use the Output folder path option to save the
classification output document.
- Optional:
Configure the following ADVANCED SETTINGS:
-
Confidence threshold (%): If the confidence value
of the category prediction of a page is less than the confidence
threshold, it is moved to the Unclassified
folder.
-
Save classification output variable: Save the
classification results as a list of dictionaries with the following
keys:
- fileName
- pageIndex
- category
- confidence
Note:
- You can select the type of classification in the Document Classifier:
- Image-based classification
- Text-based classification
- Both image and text-based classification
- To project a higher confidence threshold, we suggest that you
calculate the confidence threshold when the document pages are
similar. To determine the required confidence threshold, you can
review the confidence values from the classification output.
- The Document Classifier can auto-detect the
language for classification, and supports all languages supported by
ABBYY (an optical character recognition application).
-
Click Save and Run.
The pages from the output document are saved in the respective
subfolders, based on the categories created in the model file. Any
previously-classified documents in the output folder will be
overwritten.
Next steps
You can use each subfolder of similar documents to create and train a learning
instance to extract data from the documents. See Create a learning instance.