Create a learning instance

Create a learning instance and upload sample documents for training. In this step, you define the data elements for a single document type, such as an invoice or a purchase order, and the fields which you want to extract.

Prerequisites

Ensure the sample documents meet the following requirements:
  • Each document is a separate file. For example, if you have downloaded an email and its attachments into a single PDF, you must separate the email body from the attachments. See Using the Split document action.
  • The documents are in one of the following supported file types:
    • PDF
    • JPG
    • JPEG
    • PNG
    • TIFF
  • Use documents with a resolution value of at least 300 dots per inch (dpi).
  • In staging, you can upload a maximum of 150 documents of 10 MB file size per learning instance.
  • In production, you can upload a maximum of 50 MB file size per document. However, the maximum number of documents allowed per learning instance depends on the license.
  • There are no limitations on the number of pages per document in a pdfbox OCR.
  • You can upload 60 pages per document in an image-based OCR.
  • You can upload up to a file size of 12 MB. You can upload additional documents after creating the learning instance.
  • The file names of the documents that you upload should not start with special characters, such as the hyphen (-).
Note:
  • With the Tesseract4 OCR, currently there is a known limitation which restricts the number of pages per document to less than 60 pages.
  • Azure confidential computing enables organizations to upload encrypted data to secured storage, such as private folders on a virtual machine. If you upload documents from such secured folders to IQ Bot, these are moved to Unclassified status as data extraction is not supported for such documents.

When you start with a collection of documents to insert into a digital process, you will probably have a mix of documents types, formats, and orientations. An invoice, for example, has a consistent set of data elements, whereas a purchase order contains a different set of data elements. You must create a different learning instance for each of these document types, using the following steps:

Procedure

  1. Navigate to LEARNING INSTANCES and click the New instance option.
  2. In the Create new learning instance screen, enter the following information:
    1. Instance name: Enter a unique name.
      IQ Bot version A360.21 and below does not allow duplicate learning instance names. Even if you delete a learning instance, the name cannot be reused. From IQ Bot version A360.22, it is possible to create duplicate learning instance names, as well as reuse the name of a deleted learning instance.
    2. Optional: Description: Enter a description.
    3. Document type: Select the document type from the drop-down list.
      Do not choose standard forms as Document type while creating learning instance. Based on the option you select, a predefined set of form and table fields for the domain type appears. For example, when you select Invoices, the common forms and tables of an invoice appear.
      Note: If you want to create a domain to use specifically for this learning instance, select Document type > Other and enter a domain name. In the upcoming steps, you will customize the domain.

      For more information on creating a custom domain, watch the following video:

      If you want to create a domain to use in more than one learning instance and you have the required access permissions, you can work with Automation Anywhere support to create a custom domain. See Custom domains in IQ Bot for more information.

    4. Primary language of documents: Use the drop-down menu to select a language for the learning instance.
      To create custom domains in other languages and access up to 190 languages that IQ Bot supports, contact Automation Anywhere support.
      Important: If you are unable to see all languages in the IQ Bot interface, troubleshoot the issue: Unable to extract data from Multiple languages in a document (A-People login required)
    5. Upload your documents: Click the Browse option to upload sample documents.
  3. Select or de-select fields in the Common form fields and Common table/repeated sections fields sections.
    Form fields appear one time in a document, such as the invoice date or number. Table fields are fields that reoccur throughout the document, such as the item total or quantity.
    To see all the possible fields, click Additional form fields or Additional table/repeated section fields.
  4. Optional: Add additional fields by entering the field name in the Additional form fields or Additional table/repeated section fields section.
    Follow the naming conventions when you enter a name in the Add fields (Optional) field:
    • Field names can only begin with alphabetical characters (A-Z and a-z).
    • Field names can only include alphanumeric characters and spaces.
    • Field name cannot end with a space.
  5. Optical Character Recognition: Select the required OCR engine.
  6. Optional: De-select the My PDF documents do not have images check box. To learn more, see Disable PDFBox option
    When this check box is selected, IQ Bot uses PDFBox OCR to process PDF documents; non-PDF documents are processed by the OCR you selected in the previous step.
  7. Checkbox auto-detection: Select the Detect checkboxes check box to enable this feature.
    Selecting this option allows IQ Bot to automatically detect check boxes in a document. However, it might increase the processing time of documents.
  8. Click the Create instance and analyze option to create the learning instance.
    The system analyzes and sorts the training documents into logical groups based on field identification and shows the details in the Learning Instance > Summary tab.
When a new learning instance is created, the sample documents you uploaded are analyzed and sorted into groups based on the document characteristics. To learn more, see About the Classifier.

Next steps

After the Classifier finishes sorting the documents, you are redirected to the Designer, where you will train bots to extract data from each sample document. Train a learning instance.