Create a learning instance using Google CDE

A learning instance is a structure that holds information such as document type, language, and the fields to be extracted. After creating a custom extraction processor, you must create learning instance to extract data from the documents.

Prerequisites

  • Ensure you have successfully created and trained a Google Custom Document Extractor (CDE) processor.
  • Ensure your Control Room has the Document Workspace (Number of pages) product license.
  • Ensure you configured the BYOK. For more information, see Configure bring your own key BYOK for Google CDE.

To integrate a new processor with Google Document AI, the crucial step is the creation of a learning instance. This involves utilizing the provider as Google Document AI (User-defined) option. By creating a learning instance using this option, users can define form and table fields with matching names as present in the processor.
Note:
  • Currently, Google Document AI supports single table extraction.
  • The check box feature (in preview mode) might result in inconsistent extraction for the check boxes fields, which could lead to inconsistent results. In such cases, if the system is unable to accurately extract the check box field value, it will be labeled as Not Found.

Procedure

  1. From the Control Room home page, navigate to Manage > Learning Instances > Create Learning Instance.
    The Create Learning Instance window opens in a new tab.
  2. Add a name for the new learning instance to be created.
  3. From the Document Type drop-down menu, select User-defined.
  4. From the Provider menu, select Google Document AI (User-defined).
  5. Click Next.
  6. Select the Form fields or Table fields tab.
  7. Create new fields with the same names as the schema labels used in the Google CDE processor.
    Note: When creating new fields, ensure that their names match the schema labels used in the Google processor. You must match the names for both form fields and table fields.
  8. Click Create.

    When a new learning instance is created, the Control Room creates a folder with the same name as the learning instance in the Automation > Document Workspace Processes folder.

    You can add custom form and table fields for Google Document AI learning instances. When you want to extract data from fields that Google does not support, you can create custom fields. With this enhancement, you can use pre-trained models from Google along with custom fields for document extraction.

    Consider the following points when you add custom fields for Google Document AI learning instances:
    • You can add custom form and table fields for document types.
    • You can edit and save the custom fields.
    • A regular expression (RegEx) is available for the custom fields.
    • You can add custom fields for existing learning instances that are attached to the old package.

      In this scenario, when you save the learning instance, a notification displays to update the package version.

    • When a package is not compatible with multiple features, a message displays corresponding to the highest packageversion.
    • You can import or export the custom fields to or from the .dw file along with settings.
    • When you extract the custom fields, these fields are backward compatible with the older package version.
      • When a learning instance uses the custom fields, the old package (v.29 and earlier) does not throw an error and contains empty values for custom fields.
      • Similar to the standard fields, the old package (v.29) applies normalization and rules for custom fields, if applicable.
  9. Update the extraction bot of learning instance with Service Account and Processor Endpoint URL.
    1. Open the bot for the learning instance from Automation > Document Workspace Processes > <LI name> > <Li name>_extractionbot.
    2. From the Additional settings option, select Google DocAI.
    3. In the Service account field, pick the credential vault locker, credential, and attribute where the service account key is stored. For more information, see Configure bring your own key BYOK for Google CDE.
    4. Copy the prediction endpoint URL from Google CDE processor.
      Prediction endpoint in Google Document AI
    5. Paste the copied URL in the Endpoint URL for document processor.

      Document AI endpoint URL for document processor

Next steps

Upload documents to the learning instance, fix validation errors, and verify the extracted data. For more information, see Process documents in Document Automation.