Document Extraction package updates

Review the updates in released versions of the Document Extraction packagesuch as new and enhanced features as well as fixes and limitations. The page also lists the release dates of each version, and the compatible Control Room and Bot Agent versions.

Versions summary

The following table lists the versions of Document Extraction package released either with an Automation 360 release or as a package-only release (in descending order of release dates). Click the version link for information about updates in that package version.
Note:
  • To download an individual package (updated in an Automation 360 release where you want only the package), use this URL:

    https://aai-artifacts.my.automationanywhere.digital/packages/<package-file-name>-<version.number>.jar

  • For Document Extraction package, the naming convention is: bot-command-iqbot-extraction360-<version-number>-full.jar

    For example, bot-command-iqbot-extraction360-3.31.22-full.jar

For detailed steps on downloading a package and manually adding it to the Control Room, see Add packages to the Control Room.

3.36.10

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
What's new
Advanced data extraction using prompt tag

The @AdvancedExtraction prompt tag is introduced to use advanced vision-powered generative AI models for better data extraction. You must add this tag at the end of a single table field per table or in the table prompt to use vision-powered generative AI models for data extraction.

Using prompt tags in generative AI prompts

What's changed
Improved table extraction model

The table extraction model is updated to extract data from table headers and table fields and to improve data extraction from tables when using validation feedback.

Limitations
You will not be able to configure and use bring your own license (BYOL) for the OpenAI 01-mini generative AI model.

3.36.7

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
What's new
Support for test mode and PDFBox features

Document Extraction package supports test mode and PDFBox features.

Test learning instances | Create a learning instance in Document Automation

3.35.14

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
What's new
Data extraction using vision-powered generative AI models

Vision-powered generative AI models are integrated in Document Automation to process documents with visually-complex structures such as recognizing checkboxes and detect signatures.

Vision-powered generative AI models provide the following benefits:

Seamless data extraction
Extracts data from complex tables with nested rows, merged columns, and sections. Recognizes and captures selection elements such as checkboxes.
Developed for real-world use cases
Overcomes challenges in extracting data from various document types such as invoices, purchase orders, healthcare documents, and supply chain documents.
Effortless setup
Uses pre-trained models that work out-of-the-box where search queries are used to identify and extract information.

Vision-powered generative AI data extraction | Using prompt tags in generative AI prompts

What's changed
Improved accuracy of data extraction (Service Cloud Case ID: 02113080)

The accuracy of extracting data using the Extract data action is improved when you use vision-powered generative AI models. See Vision-powered generative AI data extraction.

Improved table extraction model (Service Cloud Case ID: 02159567, 02154057, 02145073, 02163032, 02151987, 02175105)

The table extraction model is updated to process documents that have complex headers in tables and to extract data from tables from all pages.

3.35.7

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
What's changed
Improved table extraction model (Service Cloud Case ID: 02141734)

The table extraction model is updated to process documents that have complex headers in tables.

Fixes
You can now extract data from table headers after providing validation feedback.

Previously, only partial data was extracted in certain scenarios.

Service Cloud Case ID: 02155613

You can now process documents for extraction data without encountering storage-related error.

Previously, storage-related error was displayed when processing certain documents.

Service Cloud Case ID: 02141163, 02132605

Fixed security vulnerability issues. For more information, click the release download link and view the Security & Compliance reports at A-People Downloads page (Login required).

3.34.7

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
What's changed
Improved table extraction model

The table extraction model is updated to enhance the end-of-table indicator option.

Service Cloud Case ID: 02145073, 02154694, 02160765

Fixes

When you create a learning instance with the document type set to Unstructured document and the language set to Swedish, the Document Extraction successfully extracts data from the Unstructured document type for the Swedish language.

You can now provide queries in the Search query for generative AI model option and extract data successfully from packing list documents without seeing an error.

Previously, an error was displayed when you provided certain queries in such a scenario.

Service Cloud Case ID: 02154341, 02154706, 02173044

Fixed security vulnerability issues. For more information, click the release download link and view the Security & Compliance reports at A-People Downloads page (Login required).

3.33.18

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
What's new

Out-of-box Anthropic integration

You can now use Anthropic generative AI provider directly without any additional configuration.

Create a learning instance in Document Automation

What's changed
Improved table extraction model

The table extraction model is updated to improve data extraction for tables spanning across multiple pages for unstructured document types.

Fix

When extracting data using a generative AI provider, fields will return appropriate value if the response is requested in JSON format within the search query.

Previously, specific fields were returning empty value in such a scenario.

3.33.13

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
What's changed
Improved table extraction model (Service Cloud Case ID: 02122434)

The table extraction model is updated to improve the table structure extraction and error handling.

Fixes
You can now provide validation feedback in the standard vendor_name form field in a learning instance to successfully extract vendor names.

Previously, you encountered an error in such a scenario.

Service Cloud Case ID: 02124772, 02122434, 02126627, 02129868, 02132605

For documents that contain multiple pages and tables, the primary column and end-of-table indicator fields for all the tables in the advanced training settings of the validator are updated appropriately after providing validation feedback.

Previously, the primary column and end-of-table indicator fields were not updated for all tables.

Validation feedback now works for multi-tables when you process documents that contain multi-tables with learning instances.
Limitation
Data extraction will fail in the following scenario:
  • You have created a learning instance where the document type is set to Unstructured document and the language is set to Swedish.
  • The extraction bot for the learning instance is using the Document Extraction package version 3.33.13.

3.33.11

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
You can now process documents using a learning instance when:
  • The learning instance was created with check box fields in IQ Bot.
  • The learning instance is imported to Document Automation using the IQ Bot - Document Automation Bridge package.
  • The Improve accuracy using validation option is enabled for the learning instance in Document Automation.

Previously, data extraction failed in such a scenario.

3.32.26

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later

3.32.23

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
Fixed the vulnerabilities reported in the security scan.

3.32.22

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later

3.31.22

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later

3.31.17

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
With Google Vision OCR, you can now process the documents successfully without a Google Document AI license. Also, it does not generate an error message.

Previously, it requested a Google Document AI license to process the documents and generated error while extracting documents. As a result, you were not able to extract documents with Google Vision OCR.

Service Cloud Case ID: 02097428, 02096992, 02097798, 02097157, 02098378, 02098563, 02094573

3.31.16

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
When users create a learning instance with Google Document AI (BYOK) and authenticated proxy, the document extraction no longer fails for more than 10 pages document.

Previously, in such cases, extraction failed with an error message and users were not able to process the documents.

3.31.15

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
If Document rules contain multiple conditions using the AND operator with (or without) a group, an appropriate error message is now displayed. Also, the corresponding action is now applied on the fields.

3.31.13

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
What's changed
With improved extraction of unstructured documents in Document Automation, you can:
  • Process complex queries effectively.
  • Validate documents with improved navigation to relevant page.
Limitations
When a user uses the Google Vision OCR, the table detection or extraction will not work.

Workaround: It is recommended to use the ABBYY OCR engine.

Service Cloud Case ID: 01995901

In specific cases, where the tables are spanned across multiple pages without headers in all the pages (header less pages), users might observe that the data is not getting extracted from all the pages after applying the feedback.

3.30.24

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
Users can now view the extracted data from second row correctly by using the heuristic feedback.
For the Purchase Order document type, you can now extract the table field values correctly from all the pages.
The generated feedback file no longer shows any error message and users can process documents successfully.

3.30.22

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
What's new
Document Automation provides an improved extraction through new Get document data and Update document data actions. You can use these actions to apply custom logic for data manipulation and validation to reduce manual verification efforts.

3.30.21

  • Compatible Bot Agent version: 21.98 or later
  • Compatible Control Room version: 15345 or later
Fixes
This Document Extraction package release is a patch to fix the '501: DOCUMENT_PARTIALLY_FAILED' error that occurred while processing some documents.

3.30.19

  • Compatible Bot Agent version: 21.98 or later
  • Compatible Control Room version: 15345 or later
Fixes
The Document Extraction package provides improved extraction capability for complex table header columns.
  • Scenario 1: Extracting data from table column headers with multiple headers merged into a single column.
  • Scenario 2: Extracting data from table column headers with multiple split sub-headers.
Follow these steps to enable improved table header data extraction:
  1. Create or edit a learning instance.
  2. To add or edit the table fields, navigate to the Table fields tab, and click the Add a field > Field Properties.
  3. Add each table header as a separate table field. For Example:

    Scenario 1: Add the column header and each merged sub-header as a separate table field. Using the screen-shot as a reference, you would extract data from the three merged column header fields, for which you would create three separate table fields such as CGST with alias CGST, SGST with alias SGST, and CESS with alias CESS.

    Example of column header with multiple sub-headers.

    Scenario 2: Add the column header and each split sub-header as a separate table field. Similar to the above example, for a column header CGST with split sub-headers Rate and AMT, you would need to create two separate table fields CGST Rate with alias CGST Rate, and CGST AMT with alias CGST AMT.

    Example of column header with multiple split sub-headers.

  4. Click Submit to save your updates.

3.29.17

  • Compatible Bot Agent version: 21.98 or later
  • Compatible Control Room version: 15345 or later
Fixes
The Document Extraction package has extraction improvement fixes for both form and table fields.

3.29.14

  • Compatible Bot Agent version: 21.98 or later
  • Compatible Control Room version: 15345 or later
What's new
Document Automation provides an improved extraction through heuristic feedback with a focus on complex scenarios, such as multitables. Additionally, there are extraction improvements for both form fields and out-of-the-box performance (specifically for table fields).