Vision-powered generative AI models enhance document automation by improving data extraction accuracy from complex and unstructured documents, utilizing capabilities like layout analysis and form field recognition. These models streamline workflows by reducing human intervention and are supported across various regions by providers such as Microsoft OpenAI and Anthropic Claude.

Note: Ensure that you are using the Document Extraction package version 3.35.14 or later to use the vision-powered generative AI models.

Integration of vision-powered generative AI models in Document Automation will help to process documents with visually-complex structures such as recognizing checkboxes and detect signatures.

When you use the package that supports vision-powered generative AI models, you can use the @GenAIVision prompt tag to indicate to the Document Extraction engine to use vision-powered generative AI models for data extraction. For more information, see Using prompt tags in generative AI prompts.

Capabilities

The following image shows some of the capabilities of the vision-powered generative AI models used in Document Automation:

Document Automation with vision-powered generative AI models capabilities

Document Automationwith vision-powered generative AI models provide the following enhanced capabilities over other generative AI models:

  • Layout analysis
  • Form field recognition
  • Table recognition
  • Image and graphic recognition
  • Signature and checkboxes recognition

Benefits

Vision-powered generative AI models provide the following benefits:

Seamless data extraction
Extracts data from complex tables with nested rows, merged columns, and sections. Recognizes and captures selection elements such as checkboxes.
Developed for real-world use cases
Overcomes challenges in extracting data from various document types such as invoices, purchase orders, healthcare documents, and supply chain documents.
Effortless setup
Uses pre-trained models that work out-of-the-box where search queries are used to identify and extract information.

Regions support matrix

The following table provides the vision-powered generative AI models supported by the generative AI providers in different regions:

Note: If you are using bring your own license (BYOL) for a provider, these settings will not be considered. For configuring BYOL for a provider, see Extract data action.
Regions Providers Is vision-powered generative AI model supported? Supported generative AI models
United States Microsoft OpenAI Yes GPT-4o
Anthropic Claude (Amazon Bedrock) Yes Claude 3 Haiku
Europe Microsoft OpenAI Yes GPT-4o
Anthropic Claude (Amazon Bedrock) Yes Claude 3 Haiku
Rest of the world Microsoft OpenAI No* GPT-3.5 Turbo
Anthropic Claude (Amazon Bedrock) Yes Claude 3 Haiku

* You can configure BYOL to use your own vision-powered generative AI model for data extraction. See Extract data action.