Using the Extract text action
Extract text from a PDF file and save it as a text file by using the Extract text action.
To extract text from a PDF file, perform the following steps:
- In the Actions palette, double-click or drag the Extract text action from the PDF package.
In the PDF path, select one of the following options to
specify the location of the PDF:
- Control Room file: Enables you to select a PDF file that is available in a folder in the Control Room.
- Desktop profile: Enables you to select a PDF file that is available on your device.
- Variable: Enables you to specify the file variable that contains the location of the PDF file.
In the User password or Owner
password field, enter a password to restrict access to the
encrypted PDF file.
- User password: Allow users to perform specific operations on the encrypted PDF file.
- Owner password: Allow users to use a password to open the file.
In the Text type field, select one of the following
- Plain text: Extract the text and copy it to a
This works similar to copying and pasting text from a PDF file to a text file.
- Structured text: Preserve the original formatting
of the text extracted from the PDF file.You can select the Reduce Data Loss option to ensure that the complete text is extracted with minimal overlap of characters. With this functionality, the number of characters overlapped by other characters is reduced.Note: When you select this option to extract text, the extracted text might contain extra space characters.
- Plain text: Extract the text and copy it to a text file.
In the Page range field, select one of the following
- All pages: Enables you to save all the pages in the PDF file as an image.
- Pages: Enables you to enter the page numbers of the pages that you want to save as an image.
In the Export data to text file field, specify a name
and location for the text file.
Note: You must include the .txt extension in the name of the text file. For example, if the file name is June_Quarter_report, the .txt extension is June_Quarter_report.txt.
Select the Overwrite files with the same name check box
to overwrite existing files with the same name.
Note: If this option is not selected and the bot encounters a file with the same name at the specified location, the bot will fail.
From the Assign PDF properties to a dictionary variable
list, select a dictionary variable to hold the file properties.
For more information, see Using a dictionary variable for PDF properties.
- Click Save.