Document types

Document type is the classification of documents based on their purpose, layout, and content. Document Automation supports the following document types for processing: structured, semi-structured, and unstructured documents.

Structured documents

Structured documents follow a consistent structure and clear layout where data is typed or written, making it easier for automated systems to extract and process data. Data extraction model used for such documents uses a combination of optical character recognition (OCR) capabilities with template-based model to extract key-value pairs and table data from structured documents.

The following are some of the examples of structured documents:

  • Application forms
  • Surveys
  • Passports
  • Tax forms

Semi-structured documents

Semi-structured documents are documents that have some structure or predictable format, like structured documents, but also have some variations in the layout or content. Some documents might contain common data elements, but the data might be in different locations in different documents. Data extraction model used for such documents uses a combination of OCR capabilities with keyword-based extraction, regular expressions, and validation feedback to extract key-value pairs and table data from semi-structured documents.

The following are some of the examples of semi-structured documents:

  • Invoices
  • Purchase orders (PO)
  • Bills of lading
  • Explanations of benefits (EOB)

Unstructured documents

Unstructured documents lack a standard format, fixed layout, or data without labels. The data is mostly in a natural language format without a consistent structure. The data extraction model uses a combination of OCR capabilities with natural language processing (NLP) and generative AI technologies to perform semantic analysis and to extract key-value pairs and table data from unstructured documents.

The following are some of the examples of unstructured documents:

  • Legal documents
  • Correspondence (including emails)
  • Reports
Document Automation can handle data extraction from all these document types. However, understanding which category your documents fall into is important for deciding which options to use for extracting data.
注: The Improve accuracy using validation option to provide validation feedback is not supported for unstructured documents.