Review extraction service
After you have confirmed that the documents you want to extract content from are standard forms, you can then plan the type of standard forms extraction service that fits your requirement.
The following technologies are available for processing standard forms:
IQ Bot extraction service
This is a template based extraction service that uses OCR and heuristics to extract content from standard forms. You have to train one template per standard form.
- Documents are of good quality (300 dpi)
- Document content is not very dense
- Input documents do not have any handwritten copies (limited support)
- Signatures are currently not supported
- Contains simple table layout (span within a page) with clear header, table boundaries, and so on
- Does not contain any tables or content that have checkboxes (limited support)
- Does not have any repeated sections (limited support)
- An integrated and simple out-of-the-box setup
- Various OCR engines to increase accuracy of extraction
- Complex layouts (repeated sections, continuous tables etc) can be extracted for specific cases (needs testing)
- Only requires IQ Bot license
Microsoft Azure Form Recognizer service
Third party technology, that provides custom built Artificial Intelligence (AI) models to extract content from standard forms. You can create custom models where documents can be labelled and trained.
Guidelines for using Microsoft Azure Forms Recognizer service
- Input documents:
- can be dense (contain lot of details and information) and have a reasonable quality (>200 dpi)
- can contain checkboxes and radio buttons
- can have handwritten content
- can contain signatures
- can contain tables
The input documents can also contain tables that span over a single page. However, if the standard forms contain table that span across multiple pages, the content extraction can fail.
- None of the sections in the input documents are not repeated
- Documents that contain transpose tables
Benefits of Microsoft Azure Forms Recognizer service
- Diverse standard form type documents can be processed
- Auto detection feature can identify different types of tables such as header-less table, inverted tables, and so on
- Good support for handwritten forms