Using classification in data extraction workflow

To streamline document processing workflows that involve both classification and data extraction, you can integrate the classification task into document processing workflow. This integration eliminates the need for manual document pre-classification, allowing for a unified, automated process that handles both tasks sequentially within a single workflow.

Prerequisites

Ensure that you have created a learning instance for that you want to use with this process and published it to production. See Publish the learning instance to production.

In this procedure, we have used the Classify action in the Document Classifier package and created variables accordingly. Depending on the classifier action you choose, you might have to modify the procedure and create different sets of variables.

Procedure

  1. Log in to your Control Room.
  2. Navigate to Automation > Private tab .
  3. Click Create > Task Bot.
    Ensure you do not place the bot in the Document Workspace Processes folder.
  4. Provide a name for the bot, such as doc-processing-with-classification.
  5. Create the following variables:
    Variable name Description Data type Value
    SourcePath File path to the folder containing documents to be classified String Enter the file path where the documents to be classified are located
    ClassifiedFilePath File path to the folder containing documents that are classified String Enter the file path where the classified documents are available
    OutputPath File path to the folder containing the extracted data and invalid or failed documents String Enter the file path where you want the extraction output
    FilesInFolderClassification Holds file name and extension Dictionary NA
    FoldersInFolderDataProcessing Holds folder name String NA
    FilesInFolderDataProcessing Holds file name and extension Dictionary NA

    See Create a variable.

  6. Insert a Loop action to iterate through all the documents to classify in a specific file path.
    1. Double-click or drag the Loop action to the editor.
    2. Select the For each file in folder iterator.
    3. In the Folder path field, enter $SourcePath$.
    4. In the Assign file name and extension to this variable field, enter $FilesInFolderClassification$.
  7. Configure actions for classifying documents.
    1. Drag the Classify action in the Document Classifier package into the Loop container.
    2. In the Input File field, select the Desktop file option, and enter $SourcePath$/$FilesInFolder{name}$.$FilesInFolder{extension}$.
    3. In the Classifier field, select the appropriate model file.
    4. In the Output folder path field, select the Desktop folder option, and enter $ClassifiedFilePath$.
  8. Insert a Loop action to iterate through all the folders in a specific file path.
    1. Double-click or drag the Loop action to the editor.
    2. Select the For each folder in folder iterator.
    3. In the Folder path field, enter $ClassifiedFilePath$.
      Note: We have selected the ClassifiedFilePath variable for the folder path as the classified documents are stored as separate folders in this folder.
    4. In the Assign relative folder path to this variable field, enter $FoldersInFolderDataProcessing$.
  9. Insert a Loop action to iterate through all the files for data processing in a specific file path.
    1. Double-click or drag the Loop action to the editor.
    2. Select the For each file in folder iterator.
    3. In the Folder path field, enter $ClassifiedFilePath$/$FoldersInFolderDataProcessing$.
    4. In the Assign file name and extension to this variable field, enter $FilesInFolderDataProcessing$.
  10. Configure actions to upload documents to the process associated with a specific learning instance.
    1. Drag the Create a request action in the Process Composer package into the Loop container.
    2. In the Public Process field, click Browse and select a learning instance that is available in the public mode.
    3. In the File “InputFile” field, select the Desktop file option, and enter $ClassifiedFilePath$/$FoldersInFolderDataProcessing$/$FilesInFolderDataProcessing{name}$.$FilesInFolderDataProcessing{extension}$.
    4. In the String “InputFileName” field, enter $FilesInFolderDataProcessing{name}$.$FilesInFolderDataProcessing{extension}$.
    5. In the String “OutputFolder” field, enter $OutputPath$.
  11. Using the File package, you can perform the following actions:
    • Using the Copy Desktop file action, you can make a copy of the files that were successfully processed to a different location on your desktop. For example, enter $ClassifiedFilePath$/$FoldersInFolderDataProcessing$/$FilesInFolderDataProcessing{name}$.$FilesInFolderDataProcessing{extension}$ the Source file field.
    • Using the Delete action, you can remove documents after they are uploaded to Document Automation. For example, enter $ClassifiedFilePath$/$FoldersInFolderDataProcessing$/$FilesInFolderDataProcessing{name}$.$FilesInFolderDataProcessing{extension}$ the File field.

    See Text file package.

  12. Click Save.
Now, when you run this automation, documents are classified first and then used in the learning instance for data extraction.