Improve extraction accuracy through validation

Understand how the system improves extraction accuracy though user-provided changes in the Validator.

When a learning instance is created, the user has the option to enable this feature to send feedback to the learning instance based on user-provided changes in the Validator. In Document Automation, learning instances running in production mode can continuously "learn" whenever a user resizes or relocates the extraction region in the Validator.

The following graphic provides a visual overview of the process by which learning instances continuously receive feedback from validation:

Process of "teaching" learning instances through validation feedback

  1. An uploaded document passes through the extraction engine.
  2. If the learning instance successfully extracts the data, the document is added to the straight-through processing (STP) count and the extracted values are downloaded to a file in the Success folder.

    If the learning instance can not extract the data, the system evaluates whether the document contains an unfamiliar layout.

  3. If the learning instance does not recognize the document layout (new layout), the document is sent for manual validation where the user "teaches" the learning instance how to extract the data by setting the extraction region.
  4. The extracted values are downloaded to a file in the Success folder and the changes are collected in a feedback file, which is sent to the feedback database.
    Note:
    • Feedback is only collected when the user changes the extraction region. If the user manually inputs text, the system does not collect feedback.
    • The feedback file only contains data on the field location to improve extraction accuracy for subsequent documents.

    If the learning instance recognizes the cluster, it retrieves previous feedback from the feedback database and uses it to extract data.

Use validation feedback to extract specific values in a table

As of Automation 360v.27, you can train a learning instance to extract data from a cell that contains more than one field.

For example, if a product description column also includes item number, you can outline the item number in the Validation interface. When the learning instance processes subsequent documents, it will extract the item number and ignore the product description.

Follow this process to configure a learning instance to extract specific values from a cell:
  1. Create a learning instance using an Automation Anywhere pretrained model and select the option to send validation feedback: Create a learning instance in Document Automation
  2. Upload a sample document: Process documents in Document Automation
  3. In the Validator, locate the field and redraw the box to only surround the values that you want to extract.
  4. After you click Submit, the information on the new extraction region is sent to the feedback database.
  5. Upload more documents to test the accuracy of extraction. When you are satisfied with the results, proceed to preparing the learning instance to run in production: Publish the learning instance to production

How Document Automation identifies new layouts

Document Automation extraction is based on object detection. During document processing, the extraction engine identifies objects, or key-value pairs of the field and associated value. The engine creates a "fingerprint" of the document, which stores the sequence of the objects and each object's location in the document.

When a document is processed, if the engine recognizes the keys and their locations, the document is classified and extracted based on that existing fingerprint. Otherwise, the engine saves a new fingerprint of the keys and their locations.

Process by which the engine either recognizes the existing fingerprint in a document or creates a new fingerprint