Using pre-processing in data extraction workflow

To streamline document processing workflows that involve both pre-processing and data extraction, you can integrate the pre-processing task into document processing workflow. This integration eliminates the need for manual document pre-processing, allowing for a unified, automated workflow that handles both tasks sequentially within a single workflow.

Prerequisites

Ensure that you have created a learning instance for that you want to use with this process and published it to production. See Publish the learning instance to production.

In this procedure, we have used the Enhance image action in the Pre-processor package and created variables accordingly. Depending on the Pre-processor action you choose, you might have to modify the procedure and create different sets of variables..

Procedure

  1. Log in to your Control Room.
  2. Navigate to Automation > Private tab .
  3. Click Create > Task Bot.
    Ensure you do not place the bot in the Document Workspace Processes folder.
  4. Provide a name for the bot, such as doc-processing-with-classification.
  5. Create the following variables:
    Variable name Description Data type Value
    SourcePath File path to the folder containing documents to be pre-processed String Enter the file path where the documents to be pre-processed are located
    PreProcessedFilePath File path to the folder containing documents that are pre-processed String Enter the file path where the pre-processed documents are available
    OutputPath File path to the folder containing the extracted data and invalid or failed documents String Enter the file path where you want the extraction output
    FilesInFolderPreProcessing Holds file name and extension Dictionary NA
    FilesInFolderDataProcessing Holds file name and extension Dictionary NA

    See Create a variable.

  6. Insert a Loop action to iterate through all the documents to classify in a specific file path.
    1. Double-click or drag the Loop action to the editor.
    2. Select the For each file in folder iterator.
    3. In the Folder path field, enter $SourcePath$.
    4. In the Assign file name and extension to this variable field, enter $FilesInFolderPreProcessing$.
  7. Configure actions for pre-processing documents.
    1. Drag the Enhance image action in the Pre-processor package into the Loop container.
    2. In the Input File field, select the Desktop file option, and enter $SourcePath$/$FilesInFolder{name}$.$FilesInFolder{extension}$.
    3. In the Output Path field, select the Desktop folder option, and enter $PreProcessedFilePath$.
  8. Insert a Loop action to iterate through all the documents for data processing in a specific file path.
    1. Double-click or drag the Loop action to the editor.
    2. Select the For each file in folder iterator.
    3. In the Folder path field, enter $PreProcessedFilePath$.
    4. In the Assign file name and extension to this variable field, enter $FilesInFolderDataProcessing$.
  9. Configure actions to upload documents to the process associated with a specific learning instance.
    1. Drag the Create a request action in the Process Composer package into the Loop container.
    2. In the Public Process field, click Browse and select a learning instance that is available in the public mode.
    3. In the File “InputFile” field, select the Desktop file option, and enter $PreProcessedFilePath$/$FilesInFolderDataProcessing{name}$.$FilesInFolderDataProcessing{extension}$.
    4. In the String “InputFileName” field, enter $FilesInFolderDataProcessing{name}$.$FilesInFolderDataProcessing{extension}$.
    5. In the String “OutputFolder” field, enter $OutputPath$.
  10. Using the File package, you can perform the following actions:
    • Using the Copy Desktop file action, you can make a copy of the files that were successfully processed to a different location on your desktop. For example, enter $PreProcessedFilePath$/$FilesInFolderDataProcessing{name}$.$FilesInFolderDataProcessing{extension}$ the Source file field.
    • Using the Delete action, you can remove documents after they are uploaded to Document Automation. For example, enter $PreProcessedFilePath$/$FilesInFolderDataProcessing{name}$.$FilesInFolderDataProcessing{extension}$ the File field.

    See Text file package.

  11. Click Save.
Now, when you run this automation, documents are pre-processed first to enhance the image quality and then used in the learning instance for data extraction.