Use advanced training settings

Use the advanced training settings to train your documents and provide additional inputs to Document Automation extraction engine to improve the table data extraction.

After extracting the document, you can use the Advanced training setting option on the validation page to set the following values:
  • Primary column: Set the primary column for row identification based on your requirements.
  • End of table indicator: Add an end of table indicator value for the system to extract data till the value reaches to the specified value, excluding the end of table indicator value.
  • Header labels: Adjust or re-map the table fields as required.
Note: This feature is only applicable to providers only if the Improve accuracy using validation option is available.

Prerequisites

  • The Advanced training setting option is available only if the Improve accuracy using validation option is enabled.
  • Ensure that you have the Train groups permission to provide information about header labels, end of table indicator, and a primary column used for row detection.
  • There can be only one primary column.
  • The end of table indicator is a text system-identified region (SIR).

Procedure

  1. Process a document and navigate to the validation page.
  2. Click Advanced training settings.
    Advanced training settings option in validator page
  3. Train your document to set the following values:
    1. Set the user-defined primary column for row identification.
      Setting primary column using advanced training settings
    2. Specify the end of table indicator value.
      Specifying end of table indicator for extracting data excluding the EoT text
    3. Click the required column and specify the required header name.
      Changing header value of the columns
  4. Click Submit and re-process the document.
    Note: You must click Submit to save and take these settings into effect while reprocessing the document.
    Based on the specified advanced training settings, the document is reprocessed and either sent to validator again to validate fields, if any or the data is extracted in the Success folder as CSV file.

Primary column

For example, after extracting the document, the multi-line table data from Item number column is extracted in a single row but you want to extract it in separate rows. In such cases, you can set the Item number as primary column to improve table extraction. For more details, see Example of setting primary column using advanced training settings.

End of table indicator

For example, when you process a document, it extracts entire table data where as you want to extract row data till Total payable. In such cases, you can specify the End of table indicator value so that table data till that value (excluding the End of table indicator value) will be extracted and no further row data will be extracted.

Header label

When there is a label mismatch in table data, for example the extracted header label is Unit Price but you want the header label as Price. In such cases, you can change the header label.

Another use case is you can re-map or change the header label along with the column data. For example, after extraction, the Price column from learning instance is extracted as Extended Price but you want the header label as Unit Price along with it's column data. In such cases, you can change the Extended Price header label to Unit Price and you must select and re-map at least two cells from the Unit Price column.
Changing header label to get the required header along with column data
The following micro-animation demonstrates an example of setting the Item number as primary column and extracting the data in separate row instead of single cell.