Read and Review Automation Anywhere Documentation

Automation 360

Close Contents

Contents

Open Contents

Select an OCR engine

  • Updated: 2022/02/03

    Select an OCR engine

    You can select an OCR engines that suits your requirement for data extraction based on your document types. Restarting IQ Bot services is not necessary for implementing an engine change.

    During IQ Bot installation, the system sets the latest version of Tesseract Optical Character Reader as the default OCR engine. This is also the default setting for the product. However, you can manually set the OCR engine in the Settings.txt file, which becomes the default engine. Similar to the prior releases of IQ Bot, you can continue to manually update the Settings.txt file with the OCR engine name you want to set as default.

    When creating a learning instance, you can select an OCR engine from the Optical Character Recognition drop-down menu. See Create a learning instance
    Note: Selecting an OCR engine in the interface overrides the settings in the Settings.txt file.

    The following table lists the various OCR engines supported in IQ Bot and the corresponding options:

    Table 1. List of OCR engines and their specifications
    Qualifiers OCR Version Handwritten Languages Supported Document Quality Document Type
    Tesseract OCR 4 N/A

    English

    German

    Spanish

    Italian

    French

    No noise

    No dark background

    No stamps/ watermarks

    200+ dpi

    Invoices, POs, etc.

    Semi-structured formats

    ABBYY FineReader Engine 12.3, or 12.4 N/A

    English

    All Latin+

    Chinese

    Japanese

    Korean

    Less noise

    Dark background with white fonts

    Has stamps/ watermarks

    96+ dpi

    Invoices, POs, etc.

    Semi-structured formats

    Mortgage Forms, Tax Forms

    Unstructured Formats

    Microsoft Azure Computer Vision OCR engine 2.0 or 3.2 English only

    English

    All Latin+

    Chinese

    Japanese

    Korean

    Less noise

    Dark background

    Has stamps/ watermarks

    96+ dpi

    Invoices, POs, etc.

    Semi-structured formats

    Passports, Driving license, etc.

    KYC documents

    Google Vision API Version is updated automatically to match current release N/A

    English

    All Latin+

    Asian

    Less noise

    Dark background

    Has stamps/ watermarks

    96+ dpi

    Invoices, POs, etc.

    Semi-structured formats

    Mortgage Forms, Tax Forms

    Unstructured Formats

    Tegaki API Check with your Cogent Labs sales representative Japanese and Korean Japanese and Korean

    No noise

    No dark background

    No stamps/ watermarks

    200+ dpi

    Invoices, POs, etc.

    Semi-structured formats

    Procedure

    1. On the Create a new learning instance page, select the domain and language of the documents.
      In the My learning instance list page, a new OCR Engine column is available that shows the OCR engine used for creating each learning instance. This information is useful to the user when deciding on document processing.
    2. The Fields to extract and Advanced Settings sections are displayed when you select the domain.
      Each domain is available with a predefined list of primary supported languages. Language selection is enabled and available from the Primary language of documents drop-down list based on supported languages for a specific domain.
      Note: If you select a language from the Primary language of documents drop-down list and then select an engine that does not support that language, the system displays an error message in the Optical character recognition drop-down list.
    3. Click Advanced Settings > Optical character recognition to display the OCR engine options drop-down list.

      If the OCR engine selection is invalid, the Create instance and analyze option is not enabled.

      Note: IQ Bot automatically installs all OCR engines during the installation process, except for ABBYY FineReader Engine.
      Important: You can only configure the selected OCR engine in Automation 360 IQ Bot On-Premises. OCR settings in Automation 360 IQ Bot Cloud cannot be edited as they are not accessible except for ABBYY FineReader Engine. You can edit the configuration settings for ABBYY FineReader Engine using the appConfigurations REST API.

      You can select from the following:

      OptionDescription
      Tesseract OCR 4 This is the default engine, unless changed in the Settings.txt file.
      ABBYY FineReader Engine

      To verify if ABBYY FineReader Engine is installed and available for use on your machine, check the Settings.txt file, the OCR Plug-ins folder for the SDK files, and the Optical character recognition drop-down list.

      Note: Also supported in IQ Bot [Local Device] package and IQ Bot Extraction package.

      Use ABBYY FineReader Engine OCR engine in IQ Bot

      Microsoft Azure Computer Vision OCR engine IQ Bot supports all languages supported by this OCR engine.

      Use Microsoft Azure Computer Vision OCR engine

      Google Vision API IQ Bot supports Google Vision API as an OCR engine and supports all languages supported by this engine.

      Use Google Vision API OCR engine

      Tegaki API IQ Bot supports his OCR engine to extract data specific to Japanese and Korean language documents. You need to download and use your private license to use Tegaki API.
      Note: Tegaki API OCR engine is not supported in Automation 360 IQ Bot Cloud.

      Use Tegaki API OCR engine

      My PDF documents do not have images

      All the PDF documents that you upload are processed using the PDFBox OCR by default, regardless of the OCR engine you have specified or selected.

      If you are uploading non-PDF documents or PDF documents that contain images, clear the My PDF documents do not have images check box to ensure that the OCR engine that you have specified or selected is used to process the documents.

      The My PDF documents do not have images check box is enabled by default. To disable this feature, see Disable PDFBox option.
      Tip: If IQ Bot is unable to extract data from low quality or handwritten documents, troubleshoot the issue:

      IQ Bot unable to extract data from low quality and Handwritten documents (A-People login required)

      Note: Use the following files to change the OCR settings:
      • AbbyyImagePreProcessingSettings.json
      • LangugeCodeToAbbyyLanguageCode.json
      • TegakiOCREngineSettings.json
      • Azure3OCREngineSettings.json
      • GoogleOCREngineSettings.json
      • AzureOCREngineSettings.json
      How to change OCR Settings in IQ Bot (A-People login required)
    Send Feedback