Create a learning instance in Document Automation

Download as PDF

Feedback

Updated: 2025/01/09

Create a learning instance in Document Automation

Begin processing documents by creating a learning instance to extract data from various supported document types. A learning instance is a structure that holds information such as document type, language, fields to be extracted, and so on.

Prerequisites

To create a learning instance, you must be a Learning instance creator user. See Document Automation users.
For document types that support OCR, the default OCR is ABBYY FineReader Engine. Alternatively, you can create a learning instance to process documents using Google Vision OCR.
For Standard Forms document type, ensure that you have created a custom extraction model. See Create a custom extraction model using Standard Forms.

Watch this video for the complete end-to-end process of creating a learning instance:

Procedure

From the Control Room home page, navigate to AI > Document Automation, and click Create Learning Instance.
Enter a name and description for the learning instance.
Document Automation does not allow duplicate learning instance names, so the name you provide must be unique.
Select an appropriate document type.

Note: Use the User-defined document type to process documents that are visually similar to invoices, such as purchase orders and sales orders, which contain key-value pairs and a table structure. In this document type, you create and configure all of the form and table fields.
Select the language.
For details about the languages supported in Document Automation, see Languages supported in Document Automation.
If you select a document type that is used while configuring the parser in step 3, the language selected during parser configuration is auto-selected. In addition, the locale list displays language options based on the auto-selected language.
Select a provider.
If you selected the English language in step 4, Automation Anywhere (Pre-trained) is auto-selected.
If you select a document type that is used while configuring the parser in step 3, the configured (third-party) parser is auto-selected as the provider.
Optional: Select the OCR provider. By default, Document Automation processes documents in ABBYY FineReader Engine.
Users with a Cloud Control Room can select to process documents in Google Vision OCR.
Optional: You can use the Improve accuracy using validation option to send feedback to the system to improve extraction results. For more information, see Improving extraction accuracy through validation.

Note: The Improve accuracy using validation option is available only for selected document types.
Optional: Select the Generative AI-driven data extraction option to use the generative AI capabilities for extraction. For more information, see Document Automation - Data extraction using generative AI.
Select one of the following generative AI providers:
Note:
- To use the Generative AI-driven data extraction option, ensure that you are using the Document Extraction package version 3.31.16 or later. See Document Extraction package updates.
- The Generative AI-driven data extraction option is available only for selected document types. For some document types, the Generative AI-driven data extraction option is enabled by default and cannot be disabled. You can only choose the generative AI provider for such document types.
- When you update from a previous release to v.33 or later, Open AI will be set as the default data extraction provider.
- When you select Anthropic as the data extraction provider in a learning instance and do not configure the required Anthropic settings in the corresponding extraction bot, you will see an error when processing documents.
- If you have selected the Anthropic provider for a learning instance and incorrectly configured the Anthropic settings or selected a different provider in the corresponding extraction bot, you will see an error when processing documents.
- If you have processed documents using OpenAI and then switched to Anthropic for data extraction, only the documents that will be processed after switching to Anthropic will use Anthropic for data extraction. For the previously processed documents, the data extracted would be using OpenAI.
- Open AI: OpenAI provides access to Open AI's powerful language models for content generation, summarization, image understanding, semantic search, and natural language to code translation. This provider is available via embedded license (does not require any additional licenses) and bring your own license (BYOL). If you are using BYOL, ensure that you configure the additional settings for OpenAI in the extraction bot to use this provider. See Extract data action.
- Anthropic: You can now use the Anthropic generative AI models available via AWS and GCP for data extraction in Document Automation. This offering provides you the flexibility to select the generative AI model depending on the Cloud provider your company has certified.
  Anthropic provides the following advantages:
  - Efficient processing of large, unstructured documents
  - Can handle documents in both English and other languages
  - Processes documents faster with better data extraction accuracy
  If you are using BYOL, you must configure the Anthropic Claude model on Google Vertex AI or Amazon Bedrock service and then configure the additional settings in the extraction bot to use this provider. See Extract data action.
Click Next.

We recommend that you open a sample document side by side with the Control Room window as you configure the form and table fields.

Note:

A form field is a type of field that occurs only one time in a document.
A table field is a type of field that reoccurs throughout a document, typically in the form of a table.

Configure the form and table fields for extraction. For more details, see View and search fields.
1. Click a field to open the fields editor. For more details, see .Guidelines to edit the fields and create custom aliases
2. Hover over the menu icon to the right of a field to access the up/down arrows.
3. Use the arrows to rearrange the order of the fields for a more efficient manual validation.
  The order of the fields does not impact extraction.
To learn more about the other field attributes, see Considerations for form and table fields.
Click Add a field and specify the fields details such as field name, fields label, confidence, data type, format date/number, and so on. For more details, see Considerations for form and table fields.
The following image shows form and table fields configured in a learning instance:

Note: The Add a field option is not available for Receipts document type.
Optional: On the Table fields tab, click + icon to add a custom table at learning instance level.
1. Enter a name for the table and click Add.
2. Click Add a field and specify the fields details such as field name, fields label, confidence, data type, format date/number, and so on. For more details, see Considerations for form and table fields.
  
  Note: You can also add and delete the custom table while editing a learning instance.
The custom table is displayed in the table drop-down list.

You can also view the fields from custom and default tables on the Document Rules tab but cannot select fields across all different tables. For more details about multi-table support, see Guidelines to create or edit the custom multi-table in a learning instance.
Click Create.

When a new learning instance is created, the Control Room creates a folder with the same name as the learning instance in the Automation > Document Workspace Processes folder. The folder contains two bots (extraction and download), a process, and a form. For more details, see Bots output file and folder structure.