Create learning instance with generative AI for semi-structured documents

Use this topic as a guide to create a learning instance leveraging the Generative AI (GenAI) capability to extract data from semi-structured documents such as Invoices, User-defined and Purchase orders or supply chain documents such as: Arrival Notice, Bill of Lading, Packing Lists, and Waybill.

For data extraction from semi-structured documents users have the option to use the generative AI capability in addition to the out-of-the-box user-validation feedback feature by checking the Improve accuracy using validation option while creating a learning instance. This ensures consistent and improved data extraction with out-of-the-box accuracy. Let’s walk you through the steps of creating a learning instance with the generative AI capability that will enable accurate data extraction from semi-structured documents.

Prerequisites

  • For supply chain documents, the Generative AI-driven data extraction feature is enabled by default and cannot be disabled. Therefore, you must enable generative AI and other external connections to Document Automation for processing documents without any errors. See Enable generative AI and other external connections to Document Automation.
  • A Professional Developer of a company would perform the following tasks:
    • Create, edit, and delete learning instances
    • Upload documents for processing and testing
    • Check-in and check-out learning instances from private to public folders
  • License requirement: Bot Creator license to perform the above tasks.

  • Assigned roles and permission:
    • AAE_IQBot Services or AAE_IQBot Admin
    • AAE_Basic

Procedure

  1. Log in to the Control Room and navigate to AI > Document Automation and click the Create Learning Instance button to start creating a new learning instance.
  2. Next, enter a unique learning instance name to identify it easily in the Learning Instances list, and proceed to select the other options as follows:
    Create a learning instance for semi-structured document with the generative AI capability
    1. Description (optional): This is an optional field that can be used to add a meaningful description and summarize the use of the learning instance.
    2. Document Type: Select from a list of semi-structured documents available such as: Invoices, User-defined, Arrival Notice, Bill of Lading, Packing Lists, and Waybill.
      On selecting this option, the generative AI driven data extraction feature is enabled in addition to the Improve accuracy using validation capability that is offered out-of-the box and is based on feedback sent to the system from the user-provided changes made in the Validator during the validation process. This is a critical capability for semi-structured documents as it ensures better data extraction result with a combination of user-validation feedback and GenAI capability.
    3. Language: English
      Currently, we support English language only.
    4. Locale: as per locale of the documents.
      The locale is selected based on your language and the country where the document originates from.
    5. Provider: Automation Anywhere (User-defined)
    6. OCR Provider: Google Vision OCR or ABBYY OCR
      You have the option to choose from the two supported OCR options.
  3. Click Next to begin creating form and table fields for the learning instance. From v32, generative AI capability is available for both, form and table fields. You can use GenAI capability in addition to the default custom alias support. See Create a learning instance in Document Automation, step 9 for details on adding aliases for a field.
    Document Automation uses custom aliases and the feedback capability by default for semi-structured documents. The queries for fields with lower confidence and missing field-data are passed to generative AI for extraction.
    Learning instance for semi-structured documents with generative AI enabled search query
  4. The generative AI capability for table fields can be used for column identification, which enhances data extraction with focus on table columns. This feature is a great value-add for table extraction. GenAI can identify a specific table column based on the defined search query without the need to train documents and works as an out-of-box feature in Document Automation. So you can use the GenAI-enabled search query to identify the column and then extract specific data for a field from that column using the Document Automation extraction model.
  5. Next, add a Field name which must be specific to the data point you want to extract, a Field label which is used to create a default search query, and select Data type to define the field value data structure.
    You can select from Text, Number, Date, or Address Data type value options from the drop-down. For details on creating form fields, see: Create a learning instance in Document Automation, step 10.
  6. The form and table fields can be set to Required or Optional. When leveraging the generative AI capability, the Confidence field is grayed out.
  7. You have the additional option to use the Extract field using pattern capability for extraction.
  8. For the Search query for generative AI model section, you have the option to go with the system-generated query or add a custom query.
    For example, for an address field the default Generative AI query would say ‘What is the Home address?’. You can customize the query to ‘What is the Home address with city and state?’.
  9. In the next step, define the Field Rules and Document Rules for the form and table fields and click Create to complete creating the learning instance. For details on defining the form and document rules see: Validation rules in Document Automation.

Next steps

  1. Publish the learning instance to the public repository so that the learning instance can be used in public mode to extract data from real documents, and validators can manually validate documents. See Publish the learning instance to production.
  2. In the AI > Document Automation list page, identify the learning instance you just created and published and click Process to begin uploading documents for processing and data extraction. See Process documents in Document Automation.
  3. Open the CSV document with the extracted data to compare with the processed document to validate and confirm that the GenAI enabled search query fields has extracted data with high accuracy.