Improve table data extraction
- Updated: 2024/11/12
Improve table data extraction
Use the advanced training settings to train your documents and provide additional inputs to Document Automation extraction engine to improve the table data extraction.
- Primary column: Set the primary column for row identification based on your requirements.
- End of table indicator: Add an end of table indicator value for the system to extract data till the value reaches to the specified value, excluding the end of table indicator value.
- Header labels: Adjust or re-map the table fields as required.
Prerequisites
- The Advanced training setting option is available only if the Improve accuracy using validation option is enabled.
- Ensure that you have the Train groups permission to provide information about header labels, end of table indicator, and a primary column used for row detection.
- There can be only one primary column.
- The end of table indicator is a text system-identified region (SIR).
Procedure
Primary column
For example, after extracting the document, the multi-line table data from Item number column is extracted in a single row but you want to extract it in separate rows. In such cases, you can set the Item number as primary column to improve table extraction. For more details, see Example of setting primary column using advanced training settings.
End of table indicator
For example, when you process a document, it extracts entire table data where as you want to extract row data till Total payable. In such cases, you can specify the End of table indicator value so that table data till that value (excluding the End of table indicator value) will be extracted and no further row data will be extracted.
Header label
When there is a label mismatch in table data, for example the extracted header label is Unit Price but you want the header label as Price. In such cases, you can change the header label.
Another use case is you can re-map all values of Unit Price or change the header label along with the column data. You can use auto-fill to expedite this re-mapping. For example, after extraction, the Price column from learning instance is extracted as Extended Price but you want the header label as Unit Price along with it's column data. In such cases, you can change the Extended Price header label to Unit Price and you must select and re-map all the cell values from the Unit Price column.