Document Extraction package updates

Review the updates in released versions of the Document Extraction packagesuch as new and enhanced features as well as fixes and limitations. The page also lists the release dates of each version, and the compatible Control Room and Bot Agent versions.

Versions summary

The following table lists the versions of Document Extraction package released either with an Automation 360 release or as a package-only release (in descending order of release dates). Click the version link for information about updates in that package version.
Version Release date Release type Bot Agent version Control Room build
3.32.26 18 April 2024 Package-only; post Automation 360 v.32 release 21.252 or later 19223 or later
3.32.23 5 April 2024 With Automation 360 v.32 (On-Premises) release 21.252 or later 19223 or later
3.32.22 21 March 2024 With Automation 360 v.32 (Sandbox) release 21.252 or later 19223 or later
3.31.22 26 January 2024 Package-only; post Automation 360 v.31 release 21.252 or later 19223 or later
3.31.17 22 December 2023 Package-only; post Automation 360 v.31 (Sandbox) release 21.252 or later 19223 or later
3.31.16 6 December 2023 With Automation 360 v.31 (Sandbox) release 21.252 or later 19223 or later
3.31.15 28 November 2023 With Automation 360 v.30 release 21.252 or later 19223 or later
3.31.13 16 November 2023 Package-only; post Automation 360 v.30 release 21.252 or later 19223 or later
3.30.24 21 September 2023 Package-only; post Automation 360 v.30 (Sandbox) release 21.252 or later 19223 or later
3.30.22 6 September 2023 With Automation 360 v.30 (Sandbox) release 21.252 or later 19223 or later
3.30.21 21 August 2023 Package-only; post Automation 360 v.29 21.98 or later 15345 or later
3.30.19 16 August 2023 Package-only; post Automation 360 v.29 21.98 or later 15345 or later
3.29.17 17 July 2023 Package-only; post Automation 360 v.29 release 21.98 or later 15345 or later
3.29.14 6 June 2023 With Automation 360 v.29 (Sandbox) release 21.98 or later 15345 or later
Note:
  • To download an individual package (updated in an Automation 360 release where you want only the package), use this URL:

    https://aai-artifacts.my.automationanywhere.digital/packages/<package-file-name>-<version.number>.jar

  • For Document Extraction package, the naming convention is: bot-command-iqbot-extraction360-<version-number>-full.jar

    For example, bot-command-iqbot-extraction360-3.31.22-full.jar

For detailed steps on downloading a package and manually adding it to the Control Room, see Add packages to the Control Room.

3.32.26

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
When you process a document with Google Document AI, the extraction bot now executes successfully for Portuguese language and sends the document to straight through processing (STP) or validator.

When you process a document with handwriting or signature objects, these objects are now included in the final output JSON file.

Previously, due to high confidence threshold set for signatures, handwriting or signature objects were not included in the final output JSON file.

When you process a document using Google Custom Document Extractor (CDE) with bring your own key (BYOK) setup and the corresponding processor is using foundational model, the document processing no longer fails due to transformational failure.
With improved table structure model specifically for complex tables column detection, you can now get the more accurate extraction results.

Service Cloud Case ID: 02110860

For learning instances bridged from IQ Bot to Document Automation, when validation feedback is enabled and validation feedback is applied, and user processes the next document, the data from all the pages now extracted successfully without any merged rows.

3.32.23

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
Fixed the vulnerabilities reported in the security scan.

3.32.22

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
With improved document table detection model that is adding End of Table indicator, you can now extract table data from all the pages for the selected language. Thus, it reduces missing tables and last rows extraction issues from pages.

Service Cloud Case ID: 02065073

With improved table extraction, unstructured tables no longer show the junk values and now extracts the table data successfully.
Users can now save the validation feedback in their Document Automation environment when the proxy is enabled in the Bot Agent machine.

Service Cloud Case ID: 02092484

With Google Vision OCR and proxy enabled, the document extraction no longer fails for unstructured document and does not show an error message.

Service Cloud Case ID: 02104409

3.31.22

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
After adding validation feedback to the learning instance, the document extraction no longer fails with an error message.

Previously, the document extraction was failing when validation check box was selected.

After adding validation feedback to the learning instance, the feedback is saved for all the tables across all the pages in document and data is extracted correctly from all the pages.

Previously, the feedback was not saved for all the pages.

Service Cloud Case ID: 01995135, 02093575, 02093389

After adding the validation feedback, if the table IDs are matching, data from all the tables from every page is now extracted and showing up in the validator.

Previously, in such cases, some pages were skipped and data was not showing in validator from all the pages.

When you apply the advanced training settings, you need to swap columns and all the column values need to be mapped correctly. As a result, data is extracted correctly in separate columns. You can select either to re-map all column cells or remove all other incorrect cell rows while keeping the first two rows intact. There should be no incorrect cells in the column and all column cells should have the correct values.

Previously, in such cases, the data from two columns was extracted in a single column.

You can now extract the table fields values in correct order and the multi-row extraction issue no longer persists. Also, you can use the End of table indicator feature to extract multi-line after applying feedback data when there is only one row in table.
Note: For single row tables, the best practice is to use the End of table indicator feature. Otherwise, in specific scenarios extraction might be partial.

Service Cloud Case ID: 02091013

After training a document, when user processes the same document with Google Vision OCR, the feedback gets saved and extracts the required data.

Previously, in such cases, you were not able to process a specific type of document and each time required to validate the document manually.

Service Cloud Case ID: 02098682

3.31.17

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
With Google Vision OCR, you can now process the documents successfully without a Google Document AI license. Also, it does not generate an error message.

Previously, it requested a Google Document AI license to process the documents and generated error while extracting documents. As a result, you were not able to extract documents with Google Vision OCR.

Service Cloud Case ID: 02097428, 02096992, 02097798, 02097157, 02098378, 02098563, 02094573

3.31.16

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
When users create a learning instance with Google Document AI (BYOK) and authenticated proxy, the document extraction no longer fails for more than 10 pages document.

Previously, in such cases, extraction failed with an error message and users were not able to process the documents.

3.31.15

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
If Document rules contain multiple conditions using the AND operator with (or without) a group, an appropriate error message is now displayed. Also, the corresponding action is now applied on the fields.

3.31.13

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
What's Changed
With improved extraction of unstructured documents in Document Automation, you can:
  • Process complex queries effectively.
  • Validate documents with improved navigation to relevant page.
Fixes
With improved table extraction using the ABBYY OCR engine, heuristic feedback is now working properly.
  • For German language, invoice extraction works correctly after applying feedback and all the table data is extracted.
  • For Spanish language, table data is extracted correctly from the invoice document.
  • For English language, the invoice data is extracted from all the pages with ABBYY OCR.

Service Cloud Case ID: 01995901

When a user extracts the table data from a PDF file where table is expanded to multiple pages, the data from all the pages extracted successfully after applying the heuristic feedback.

Previously, users were not able to extract data from the second page of the PDF file where table is expanded to multiple pages.

Service Cloud Case ID: 01996536

Starting the extraction from first page for all the fields, the heuristic feedback is now working properly for multi-line table data capturing and generates the correct output.

Previously, the multi-line table data was not extracted even after providing the heuristic feedback. As a result, the output was not generated properly.

Service Cloud Case ID: 01944805, 01946809, 01952836, 01957090, 01975800, 01981088, 01944805, 01946809, 01952836, 01957090

For Microsoft Standard Forms, the table extraction no longer fails when cells are empty and users can extract the document successfully.
When a user imports a leaning instance and process the documents, the extracted document shows the correct order of words for dates in all the pages.
When a user imports a learning instance and process the documents, all the values are displayed in the table after extraction.

Previously, in such cases, the system-identified region (SIR) was highlighted but an empty value was shown in the table.

When a user imports a .dw file with heuristic feedback and process a document that contains (-) value in the last row, the documents are extracted correctly without skipping the negative value in last row.

Previously, in such cases, the last row was skipped resulting into either data loss or incorrect processing.

When a user processes a document that contains table, the extraction finishes successfully without the DOCUMENT_PARTIALLY_FAILED or Extraction Timeout error message.

Previously, in such cases, some documents were not extracted because of multiple detections from the same table and caused table size (max () arg) issue.

When a user imports a learning instance and process the documents, all the rows are extracted separately from all pages.

Previously, rows from second page were merged into one row.

Limitations
When a user uses the Google Vision OCR, the table detection or extraction will not work.

Workaround: It is recommended to use the ABBYY OCR engine.

Service Cloud Case ID: 01995901

In specific cases, where the tables are spanned across multiple pages without headers in all the pages (header less pages), users might observe that the data is not getting extracted from all the pages after applying the feedback.

3.30.24

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
Fixes
Users can now view the extracted data from second row correctly by using the heuristic feedback.
For the Purchase Order document type, you can now extract the table field values correctly from all the pages.
The generated feedback file no longer shows any error message and users can process documents successfully.

3.30.22

  • Compatible Bot Agent version: 21.252 or later
  • Compatible Control Room version: 19223 or later
What's New
Document Automation provides an improved extraction through new Get document data and Update document data actions. You can use these actions to apply custom logic for data manipulation and validation to maximize straight-through processing (STP) and reduce manual verification efforts.

3.30.21

  • Compatible Bot Agent version: 21.98 or later
  • Compatible Control Room version: 15345 or later
Fixes
This Document Extraction package release is a patch to fix the '501: DOCUMENT_PARTIALLY_FAILED' error that occurred while processing some documents.

3.30.19

  • Compatible Bot Agent version: 21.98 or later
  • Compatible Control Room version: 15345 or later
Fixes
The Document Extraction package provides improved extraction capability for complex table header columns.
  • Scenario 1: Extracting data from table column headers with multiple headers merged into a single column.
  • Scenario 2: Extracting data from table column headers with multiple split sub-headers.
Follow these steps to enable improved table header data extraction:
  1. Create or edit a learning instance.
  2. To add or edit the table fields, navigate to the Table fields tab, and click the Add a field > Field Properties.
  3. Add each table header as a separate table field. For Example:

    Scenario 1: Add the column header and each merged sub-header as a separate table field. Using the screen-shot as a reference, you would extract data from the three merged column header fields, for which you would create three separate table fields such as CGST with alias CGST, SGST with alias SGST, and CESS with alias CESS.

    Example of column header with multiple sub-headers.

    Scenario 2: Add the column header and each split sub-header as a separate table field. Similar to the above example, for a column header CGST with split sub-headers Rate and AMT, you would need to create two separate table fields CGST Rate with alias CGST Rate, and CGST AMT with alias CGST AMT.

    Example of column header with multiple split sub-headers.

  4. Click Submit to save your updates.

3.29.17

  • Compatible Bot Agent version: 21.98 or later
  • Compatible Control Room version: 15345 or later
Fixes
The Document Extraction package has extraction improvement fixes for both form and table fields.

3.29.14

  • Compatible Bot Agent version: 21.98 or later
  • Compatible Control Room version: 15345 or later
What's New
Document Automation provides an improved extraction through heuristic feedback with a focus on complex scenarios, such as multitables. Additionally, there are extraction improvements for both form fields and out-of-the-box performance (specifically for table fields).