Document Automation support for Google Custom Document Extractor (CDE)

In Document Automation, you can create a user-trained learning instance and extract using a Google Custom Document Extractor (CDE) processor.

The new capability can be used to train a model using Google Custom Document Extractor (CDE) for any document type covering 50 languages. Once a model is deployed, the processor URL can be embedded within Document Automation extraction process.

To use Google CDE, you must have a:
  • Google subscription to Google Document AI workbench.
  • License for Document Automation Platform > Document Workspace pages.
Note: When working with API URL trusted list for Google CDE, you must add all APIs to the trusted list on the Bot Agent machine. The list of allowed APIs for Google CDE is as follows:
  • Google accounts
  • Google OAuth
  • Google APIS
  • Processor end point (only the host to be added to the trusted list)
    For example,
    https://eu-documentai.googleapis.com/v1/projects/<<Project ID>>/locations/eu/processors/<<Processor ID>>:process

Usage of Google CDE

The effort involved in creating and maintaining models with Google CDE is justified by various scenarios, including:
  • Extended language support: When working with documents that require support for additional languages, and existing pre-trained models do not offer that capability, Google CDE becomes essential.

    For supported languages, see Language support for Google CDE.

  • Unsupported document formats: Google CDE is beneficial when dealing with document types that lack compatible parsers.
  • Addressing accuracy and performance challenges: In specific document formats, even with the use of pre-trained models, achieving the desired accuracy can be difficult. Google CDE with specific training on documents can provide better accuracy.
  • Custom or non-standard field extraction: Google CDE can be used in scenarios where specific fields need to be extracted from documents that have custom or non-standard formats.
  • Extraction based on specific training when labels do not exist: Google CDE is beneficial when there is a need to extract information from fields where pre-defined labels do not exist.