Document extraction overview

The document extraction process enables you to define specific form and table fields that you want to extract from your documents.

The system then automatically extracts the specified data from these documents for further analysis and integration into downstream workflows. This process enhances efficiency, accuracy, and overall productivity in data processing.

When a user creates a new learning instance, the Control Room automatically creates a folder with the same name as the learning instance in the Automation > Document Workspace folder. Within that folder, the Control Room creates the following two bots:

  • Extraction bot: Extracts data from defined fields in uploaded documents.
  • Download bot: Downloads the extracted data to a specific folder on the device or shared network depending on the output results option configured in the Download bot.

The Document Extraction package is used to extract data and download the extracted data from documents to a specific location.

The Document Extraction package provides the following capabilities:

  • Diverse document types: Process a wide range of document types for various document processing use cases. You can integrate your custom data extraction parsers to leverage your pre-trained, domain-specific models for your document processing workflows.
  • Validation rules: Define various conditions, such as pattern matching or equality checks. When these conditions are met, you can quickly take action to flag errors or warnings, clean or replace values, or set new values. These rules ensure the accuracy of extracted data across multiple fields in your documents.
  • Generative AI providers: Extract data from different document types by using pre-trained models from generative AI providers such as Azure OpenAI or Anthropic. Users can define search queries when configuring fields once, and then for every document processed, the data is extracted without any additional configuration.
  • Validation feedback: Provide feedback on extracted data accuracy by verifying and correcting the extracted data. This process creates a feedback loop that helps the system to continuously enhance data accuracy over time.
  • Automation Co-Pilot validator: Provides a user-friendly interface to highlight errors or warnings in documents. The validator displays a red outline for fields that require validation. Users can validate the data for such fields and submit the documents for reprocessing.
  • Integration with Automation 360: Seamlessly integrate the extracted data into various workflows for further processing in Automation 360.