Vision-powered generative AI data extraction
- Updated: 2025/02/11
Vision-powered generative AI models enhance document automation by improving data extraction accuracy from complex and unstructured documents, utilizing capabilities like layout analysis and form field recognition. These models streamline workflows by reducing human intervention and are supported across various regions by providers such as Microsoft OpenAI and Anthropic Claude.
Integration of vision-powered generative AI models in Document Automation will help to process documents with visually-complex structures such as recognizing checkboxes and detect signatures.
When you use the package that supports vision-powered generative AI models, you can use the @GenAIVision prompt tag to indicate to the Document Extraction engine to use vision-powered generative AI models for data extraction. For more information, see Using prompt tags in generative AI prompts.
Capabilities
The following image shows some of the capabilities of the vision-powered generative AI models used in Document Automation:
Document Automationwith vision-powered generative AI models provide the following enhanced capabilities over other generative AI models:
- Layout analysis
- Form field recognition
- Table recognition
- Image and graphic recognition
- Signature and checkboxes recognition
Benefits
Vision-powered generative AI models provide the following benefits:
- Seamless data extraction
- Extracts data from complex tables with nested rows, merged columns, and sections. Recognizes and captures selection elements such as checkboxes.
- Developed for real-world use cases
- Overcomes challenges in extracting data from various document types such as invoices, purchase orders, healthcare documents, and supply chain documents.
- Effortless setup
- Uses pre-trained models that work out-of-the-box where search queries are used to identify and extract information.
Regions support matrix
The following table provides the vision-powered generative AI models supported by the generative AI providers in different regions:
Regions | Providers | Is vision-powered generative AI model supported? | Supported generative AI models |
---|---|---|---|
United States | Microsoft OpenAI | Yes | GPT-4o |
Anthropic Claude (Amazon Bedrock) | Yes | Claude 3 Haiku | |
Europe | Microsoft OpenAI | Yes | GPT-4o |
Anthropic Claude (Amazon Bedrock) | Yes | Claude 3 Haiku | |
Rest of the world | Microsoft OpenAI | No* | GPT-3.5 Turbo |
Anthropic Claude (Amazon Bedrock) | Yes | Claude 3 Haiku |
* You can configure BYOL to use your own vision-powered generative AI model for data extraction. See Extract data action.