Document classification overview

Document classification is an automated process that is performed using the classifier packages to group or classify documents or pages within a document into different categories based on their attributes, such as layout, content, or both.

You can use this process in scenarios where you need to organize documents and then execute document processing. For example, after the document classification process is complete, you can process them in appropriate learning instances.

How classification works

Document classification enables document processing in the following ways:

Organizing documents
When a file contains numerous documents, document classification helps to sort the documents into relevant categories, making it easy to manage and retrieve these documents. These files can have documents of the same type (such as invoices) or different types (such as invoices, bill of lading, and purchase orders).
Streamlined workflow
When the classifier identifies the correct documents, you can use the classified documents in the right document processing workflow, thus improving document identification and accuracy of data extraction, for instance, processing documents in appropriate learning instances for data extraction.
Increases efficiency
By reducing manual effort spent on sorting and classifying documents, document classification saves time and minimizes manual errors.

Types of classifiers

You can choose one of the following classifier options based on your individual use case or business requirements.

Document Classifier

This classifier groups documents into different category folders (representing document categories) based on the first page of each document.

In addition, Document Classifier can also classify individual pages within a document into different folders. If there are multi-page documents embedded in the larger document, the individual pages will need to be merged after this page-level classification is complete to process them as single documents. For example, if a mortgage document includes customer information (KYC) on page 1 and page 2 and customer bank statement on page 3 and page 4, page 1 and page 2 are classified as customer information and saved in KYC folder and page 3 and page 4 are classified as bank statements and saved in bank statement folder. To process KYC pages as a single document, you will need to merge page 1 and page 2 stored in the KYC folder. Similarly, to process the bank statement as a single document, you will need to merge page 3 and page 4 stored in the bank statement folder.

Advanced Classifier
This classifier, in addition to the capabilities of Document Classifier, enables splitting a document into multiple documents and enables document or page level classification using predefined rules. It requires a separate license from Skilja. We recommend that you use this classifier only when the Document Classifier does not meet your requirements.

To understand the differences between Advanced Classifier and Document Classifier, see Comparing Advanced Classifier and Document Classifier.