Review extraction service

After you have confirmed that the documents you want to extract content from are standard forms, you can then plan the type of standard forms extraction service that fits your requirement.

The following technologies are available for processing standard forms:

IQ Bot extraction service

This is a template based extraction service that uses OCR and heuristics to extract content from standard forms. You have to train one template per standard form.

Guidelines for using IQ Bot extraction service
  • Documents are of good quality (300 dpi)
  • Document content is not very dense
  • Input documents do not have any handwritten copies (limited support)
  • Signatures are currently not supported
  • Contains simple table layout (span within a page) with clear header, table boundaries, and so on
  • Does not contain any tables or content that have checkboxes (limited support)
  • Does not have any repeated sections (limited support)
Benefits of IQ Bot extraction service
  • An integrated and simple out-of-the-box setup
  • Various OCR engines to increase accuracy of extraction
  • Complex layouts (repeated sections, continuous tables etc) can be extracted for specific cases (needs testing)
  • Only requires IQ Bot license

Microsoft Azure Form Recognizer service

Third party technology, that provides custom built Artificial Intelligence (AI) models to extract content from standard forms. You can create custom models where documents can be labelled and trained.

Guidelines for using Microsoft Azure Forms Recognizer service

  • Input documents:
    • can be dense (contain lot of details and information) and have a reasonable quality (>200 dpi)
    • can contain checkboxes and radio buttons
    • can have handwritten content
    • can contain signatures
    • can contain tables

      The input documents can also contain tables that span over a single page. However, if the standard forms contain table that span across multiple pages, the content extraction can fail.

  • None of the sections in the input documents are not repeated
  • Documents that contain transpose tables

Benefits of Microsoft Azure Forms Recognizer service

  • Diverse standard form type documents can be processed
  • Auto detection feature can identify different types of tables such as header-less table, inverted tables, and so on
  • Good support for handwritten forms