Disable PDFBox option

The PDFBox option is enabled by default. Disable the option when you are training hybrid PDF documents containing images and text.

The PDFBox option works best with completely digital documents only. When using hybrid documents containing images and text, our recommendation is to disable the PDFBox option for better document classification.
Note: The PDFBox option is enabled in the system by default. Ensure that the PDFBox is kept enabled only if you plan to process digital documents, otherwise processing will fail.
If PDFBox is enabled, you can process the following PDF types:
  • Vector and Hybrid PDF can be processed using PDFBox
  • Raster PDF can be first processed using PDFBox, and if no segment is found then the PDF is processed again using Document Image OCR
There are two ways in which you can disable/enable the PDFBox option in IQ Bot:
  • Directly in the UI during the creation of a learning instance. In the Create new learning instance page go to Advanced Settings > Optical character recognition and disable/enable the My PDF documents do not have images check-box.
  • In the Setting.txt file described as follows:

Procedure

  1. Navigate to C:\Program Files (x86)\Automation Anywhere IQ Bot\Configurations.
  2. Open the Setting.txt file, and change PDFBoxOCREnabled=true to PDFBoxOCREnabled=false
    This turns off the processing of uploaded documents by PDFBox for new learning instances (after applying this change), and does not apply to the existing learning instances. IQ Bot will use your selected OCR engine for PDF documents as well.
    Note: When PDFBox is disabled, ensure that your PDF document is less than 60 pages.
  3. After updating the Setting.txt file, execution of stoppedanduninstalled and installedandstartedstart of IQ Bot services is not required.