Automation 360

Vision-powered generative AI data extraction

Download as PDF

Vision-powered generative AI data extraction

Download as PDF

Updated: 2026/03/11

Vision-powered generative AI models enhance document automation by improving data extraction accuracy from complex and unstructured documents, utilizing capabilities like layout analysis and form field recognition. These models streamline workflows by reducing human intervention and are supported across various regions by providers such as Microsoft OpenAI and Anthropic Claude.

Note: Ensure that you are using the Document Extraction package version 3.35.14 or later to use the vision-powered generative AI models.

Integration of vision-powered generative AI models in Document Automation will help to process documents with visually complex structures such as recognizing checkboxes and detecting signatures.

When you use the package that supports vision-powered generative AI models, you can use the @GenAIVision prompt tag to indicate to the Document Extraction engine to use vision-powered generative AI models for data extraction. For more information, see Using prompt tags in generative AI prompts.

Capabilities

The following image shows some of the capabilities of the vision-powered generative AI models used in Document Automation:

Document Automation with vision-powered generative AI models capabilities

Document Automation with vision-powered generative AI models provide the following enhanced capabilities over other generative AI models:

Layout analysis
Form field recognition
Table recognition
Image and graphic recognition
Signature and checkboxes recognition

Benefits

Vision-powered generative AI models provide the following benefits:

Seamless data extraction: Extracts data from complex tables with nested rows, merged columns, and sections. Recognizes and captures selection elements such as checkboxes.
Developed for real-world use cases: Overcomes challenges in extracting data from various document types such as invoices, purchase orders, healthcare documents, and supply chain documents.
Effortless setup: Uses pre-trained models that work out-of-the-box where search queries are used to identify and extract information.

Regions support matrix

The following table provides the vision-powered generative AI models supported by the generative AI providers in different regions:

Note:

If you are using bring your own key (BYOK) (models hosted in your own account), the information provided in this matrix does not apply. For instructions on how to configure BYOK, see Extract data action.
When using BYOK, use Model connections. See Using Model connections in Document Automation.


Regions	Providers	Is vision-powered generative AI model supported?	Supported generative AI models
United States	Microsoft OpenAI	Yes	GPT-5.1
	Anthropic Claude (Amazon Bedrock)	Yes	Claude Haiku 4.5
	Google Gemini	Yes	Gemini 2.5 Flash
Europe	Microsoft OpenAI	Yes	GPT-5.1
	Anthropic Claude (Amazon Bedrock)	Yes	Claude Haiku 4.5
	Google Gemini	Yes	Gemini 2.5 Flash
Australia	Microsoft OpenAI	Yes	GPT-4.0
Australia	Anthropic Claude (Amazon Bedrock)	Yes	Claude Haiku 4.5
India	Microsoft OpenAI	Yes	GPT-4.0
India	Anthropic Claude (Amazon Bedrock)	Yes	Claude 3 Haiku
Canada	Microsoft OpenAI	Yes	GPT-4.0
Japan	Microsoft OpenAI	Yes	GPT-4.0

More resources

To learn more, search for the Vision Powered Generative AI Data Extraction course in Automation Anywhere University: RPA Training and Certification (A-People login required).