Document Automation - Data extraction using generative AI

Document Automation for Automation 360 Cloud and On-Premises provides generative AI (GenAI) capability to extract data seamlessly from unstructured and semi-structured documents without prior training. Create learning instance with GenAI capability to process documents in English, using a large language model (LLM).

Note: Generative AI models can produce errors and/or misrepresent the information they generate. It is advisable to verify the accuracy, reliability, and completeness of the content generated by the AI model.

Benefits

Enhance extraction accuracy in a learning instance by using the Search query for generative AI model feature when defining form and table fields. Document Automation offers a default customizable query based on your selected field. Transmitting your query to GenAI enhances and enables data extraction from different document types without prior training. Leverage this innovation to enhance your document processing capability.

How generative AI improves extraction

When you create a learning instance for unstructured documents (such as: Contracts, Agreements, Reports, Letters, and Emails), the GenAI-driven data extraction capability is automatically selected. While defining the Form fields and Table fields for your learning instance, you can leverage the Search query for generative AI model option to customize your data extraction request.

For an address field, the GenAI query provides a default query such as: ‘What is the Property Address?’. You can customize this query for more focused extraction to say: ‘What is the full Property Address with city, state and zip code?'

Document Automation data extraction using generative AI

On processing a document, using this learning instance, the GenAI capability will extract the complete address, instead of just the street name and number. All you need to do is define the search query in the model just once, and then for every document processed using this model, the data gets extracted with no additional configuration.

When creating a learning instance for semi-structured documents such as Invoices, User-defined and Purchase orders or supply-chain documents such as: Waybill, Bill of Lading, Arrival Notice, and Packing Lists, you can leverage the GenAI-driven data extraction capability in addition to the native extraction capability based on user-provided updates in the Validator.

Important:
  • Privacy Notice: When the generative AI capability is selected, the query is sent to a third-party service. Currently, the data is sent to Microsoft Azure OpenAI service or Anthropic that is available on Amazon Bedrock or Google Vertex AI. If you do not want your data sent to a third-party service, we recommend not using the unstructured and semi-structured document types that uses the generative AI feature out-of-the-box. For regions support matrix, see Document Automation settings.
  • When a generative AI query does not match a result, the generative AI model returns a blank value or an empty response. In such a scenario, tweak your query to get the desired result.