Using pre-trained document types
- Updated: 2024/09/12
Using pre-trained document types
A pre-trained document type is a model that has already been trained on a large dataset of similar documents, such as invoices, arrival notices, and bills of lading.
Overview
The pre-training is done either in-house or by third-party providers, so customers do not need to do it themselves. These document types are designed to extract key-value pairs, data from tables, and unstructured information from documents of the same or similar types. Pre-trained document types, or pre-trained models, come with a set of predefined fields that users can select and customize when creating a learning instance.
Use pre-trained document types to achieve the following:
- Rapid deployment
- Quickly implement document extraction processes by saving time in creating, training, and deploying custom models.
- Improved accuracy
- As these document types are trained on large document sets, they provide higher accuracy compared with custom document types.
Pre-trained document types are supplied by extraction providers. An extraction provider is a service that specializes in processing specific document types and extracting data from documents based on predefined rules or models.
- Automation Anywhere
- This extraction service is developed in-house and trained to extract data from documents such as invoices, arrival notices, bill of lading, and documents of similar types. These document types can optionally connect to generative AI services such as Azure OpenAI or Anthropic to further boost the model’s capabilities for extracting data.
- Google Document AI
- This extraction service is developed by Google and offers pre-trained parsers to extract data from documents such as invoices, receipts, and utility bills. Integrating pre-trained document parsers from Google Document AI in Document Automation allows users to leverage advanced, ready-to-use document processing capabilities.
Support matrix
The following table provides the pre-trained document types supported in Document Automation.
Document type | Extraction provider | Generative AI provider |
---|---|---|
Invoices | Automation Anywhere | Yes |
Google Document AI | No | |
Arrival Notice | Automation Anywhere | Yes* |
Bill of Lading | Automation Anywhere | Yes* |
Packing List | Automation Anywhere | Yes* |
Receipts | Google Document AI | No |
Utility Bill | Google Document AI | No |
Waybill | Automation Anywhere | Yes* |
*The generative AI provider option is enabled by default and cannot be disabled for this document type.
Note: If you do not find the document type that you want to use, you can use the User-defined document type to support your use case. See Document types: support matrix.