Disable PDFBox option
- Updated: 2021/04/20
Disable PDFBox option
The PDFBox option is enabled by default. Disable the option when you are training hybrid PDF documents containing images and text.
The PDFBox option works best with completely digital documents only.
When using hybrid documents containing images and text, our recommendation is to
disable the PDFBox option for better document classification.
Note: The PDFBox option is enabled in the system
by default. Ensure that the PDFBox is kept enabled only if you
plan to process digital documents, otherwise processing will fail.
If PDFBox is enabled, you can process the following PDF types:
- Vector and Hybrid PDF can be processed using PDFBox
- Raster PDF can be first processed using PDFBox, and if no segment is found then the PDF is processed again using Document Image OCR
There are two ways in which you can disable/enable the PDFBox option
in IQ Bot:
- Directly in the UI during the creation of a learning instance. In the Create new learning instance page go to and disable/enable the My PDF documents do not have images check-box.
- In the Setting.txt file described as follows: