Process documents in Document Automation

Upload sample invoices to test the learning instance, verify the extracted data, and fix validation errors.

Prerequisites

  • If you have not done so already, Create a learning instance in Document Automation.
  • Verify that your device is connected to the Control Room: Install Bot Agent and register device
  • If the learning instance uses a Google Document AI model and you did not purchase Google Document AI licenses through Automation Anywhere , you must provide your Google Document AI credentials to the Extraction bot. See Configure key for Google Document AI
  • If the learning instance uses an Automation Anywhere model, ensure that each file is 50 MB or less.

    If the learning instance uses a Google Document AI model, ensure that each file is 20 MB or less, with a maximum of 5 pages.

  • Ensure that the sample documents are in one of the following supported document types:
    • PDF
    • JPG
    • JPEG
    • PNG
    • TIF
    • TIFF
  • The default output format for the extracted data is CSV file. To change the output to JSON, see Change output format from CSV to JSON.

Perform the following steps to upload sample invoices to the learning instance to test the data extraction capabilities of the learning instance.

Procedure

  1. Upload documents to test the learning instance:
    See these steps in a video:

    1. Click Process documents.
      Process documents
    2. In the Process Documents window, click Browse to select the files to upload.
    3. In the Download data to field, enter the file path that will hold extracted data.
      When the process runs, it creates the following three folders in the provided file path:
      • Success: Contains the extracted data in the specified format (CSV or JSON).
      • Invalid: Holds documents marked invalid.
      • Failed: Holds documents that could not be processed.

      You can provide an output folder path based on one of the following options:

      • Option 1: The local device path if you have set up document processing and validation on the same device.

        This option is typically used when you are testing the learning instance.

      • Option 2: The shared folder path if you have set up distributed validation on separate devices.

        This option is typically used for published learning instances. For example, \\10.239.192.60\Sharepath\Output.

    4. Click Process documents.
      The Bot Runner window appears. The window disappears when the documents are done processing. Refresh the Learning instances table to see the updated metrics.

If there is a value next to the Validate documents link, you must manually validate the document fields. Otherwise, proceed to step 3.

  1. Fix the validation errors
    1. Click Validate documents.
      The Automation Co-Pilot Task Manager opens in a new tab, with the first failed document in queue. For an introduction to the Validator user interface, see Validating documents through Automation Co-Pilot validator.
    2. Review each field to verify the data type and extracted value.
      Document Automation supports the following data types: text, number, date, address, and check box
      Alternatively, from the drop-down list on the right panel, you can select Show fields that need validation.
      Note: When documents are awaiting validation, if you edit the learning instance, click Reprocess to reattempt extraction.

      Reprocessing documents does not affect the uploaded documents metric.

    3. Update the fields with errors.
      Click the field or draw a box around the values that you want to extract.
      For Automation Anywhere pre-trained models, you can configure the learning instance to extract specific values in a field and ignore others. For more information, see Use validation feedback to extract specific values in a table.
      • To skip a document without correcting errors, click Skip to proceed to the next document in the validation queue.
      • To remove a document that cannot be processed, click Mark as Invalid.
    4. After you make the necessary corrections, click Submit so that the document can finish processing.
      The next document in queue appears. When all the documents are corrected, the system displays a message stating that no more tasks are available.
    5. Close the tab to return to the Learning Instances page.
  2. Verify the output results:
    1. Open the file in the Success folder that contains the extracted data and review the results to ensure that it matches your use case.
      The Microsoft forms return extracted values (OCR data) in the JSON format, such as GUID_0-MSFormTableResult.json. Along with the extracted document data in <<GUID>>_FileName CSV file, the Success folder also shows the extracted table data separately in another CSV files. Based on the number of tables in the document, you can find different CSV files for each table. For example, <<GUID_PAGE_NUMBER-Table_FILENAME_PAGENUMBER_TABLENUMBER.

      With separate table data, you can compare extracted data with Microsoft engine data in the GUID_0-MSFormTableResult.json file.

    2. Optional: Review the Learning Instance dashboard.
      The dashboard displays the total number of uploaded documents and the number of documents pending validation.
If the learning instance repeatedly cannot find a field or if characters are not correctly recognized (such as the letter "l" extracted as the number "1"), you can try changing the OCR to Google Vision OCR.

Next steps

Build a bot that uploads documents from a source folder to the learning instance. Then, publish the learning instance assets (process, form, and bots) to the public repository so that the learning instance can be used in public mode to extract data from real documents, and validators can manually validate documents: Publish the learning instance to production