Use Co-Pilot for Automators to simplify post-processing of data extracted by a Document Automation learning instance.

Note: Generative AI models can produce errors and/or misrepresent the information they generate. It is advisable to verify the accuracy, reliability, and completeness of the content generated by the AI model.

Why prompt for data transformation

Once Document Automation extracts data from a document, the real power comes from shaping that data exactly the way you need it. Transforming, normalizing, and enriching extracted data is a natural next step, and with Co-Pilot for Automators, it's never been more straightforward.

Co-Pilot removes the traditional complexity from data transformation. Instead of navigating multiple command packages, memorizing internal data structures, or writing Python or JavaScript by hand, developers can simply describe what they want in plain English. Co-Pilot handles the rest, automatically generating a ready-to-run task bot from a single prompt.

Use Cases

Post-processing tasks fall into three complexity tiers:
Tier Description Examples
Simple Single string value converted to another value based on defined logic.
  • Convert "11,435.0000" → "11,435.00"
  • Convert "(1,500.00)" → "-1,500.00"
Medium Multiple manipulations, or logic that depends on values from more than one field.
  • Normalize negative numbers from multiple formats (brackets, trailing minus, CR suffix) to standard form
  • Convert partial date using a year from another field: 10/SEP → 09/10/2024
  • Remove duplicate rows from a table
Complex Transformations involving external data sources or fully custom logic.
  • Fuzzy-match extracted VendorName and VendorAddress against a CSV file using Vendor_ID, Vendor_Name, Street, City, State, ZIP columns
Note: Copilot is designed to provide Python to resolve Simple and Medium cases. For Complex cases, it provides a working scaffold that developers can extend with custom Python logic.

Prompt Template

Use the following template when invoking Co-Pilot for Automators. Only plain English instructions are needed, no special syntax is required.

Get fields: [<field name>, <field name>, ...]
Transformation description: <describe the transformation in plain English>
Example: [optional — provide a before/after example if helpful]
Update fields: [<field name>, <field name>, ...]

Use the following example to enter your own data for the transformation.

Example Prompt

Get fields: [Total Amount]
Tranformation description: If the field has 4 decimal digits, round it to 2 decimal digits.
Example: "11,435.0000" → "11,435.00"
Update fields: [Total Amount]

Building the task bot

The following is a reference of the task bot Copilot generates, with the following sequence of actions. This task bot is intended for use in a larger process with your source document (run through Document Automation). See Add Document Extraction Task to a process automation.
Note: As a prerequisite, you should have run the Extract Data action on the document to store the data in a Record variable. The primary output (Recordset variable) is passed to the following steps in this transformation. See, Extract data action.
  1. Get Document Data: Uses your variable to retrieve the full document data and stores the output in a variable as a Recordset (example: $DocumentData$).
    Note: The Python package cannot operate on Recordset objects directly. The next step converts the Recordset to a JSON string.
  2. Convert Record to JSON String: Uses the String: Assign action to serialize $DocumentData$ into a plain JSON string. This is the representation that Python will read and modify. Stores the result in a variable (example: $DocumentJson$).
  3. Open and Apply Python Logic: A Python session is opened and $DocumentJson$ passes as input. The session contains a function (example: <normalize data>) that applies the transformation described in the prompt (see the following example JSON script).

    Result: The result is that the function returns the same JSON structure as the input, with the updated field values. Stores the return value in a variable (example: $UpdatedJson$).

  4. Update Document Data Pass $UpdatedJson$ to the Update Data action to transform values to send back to the server.
    Note: Ensure the output (example: $UpdateOutput$) is configured as an output variable to carry the document status, that a parent process uses to determine if the document should be routed to a validation queue.
Variable Purpose
$DocumentData$ Recordset output from the Get Document Data action.
$DocumentJson$ JSON string serialized from DocumentData; passed into Python.
$UpdatedJson$ JSON string returned from Python with transformed field values.
$UpdateOutput$ Status result from the Update Data action; used as output variable at the process level.

End-to-End Flow of the task

The complete sequence, starting with the Document Data from the prerequisite.

# Action Output Variable
1 Document Extraction: Extract Data
Document ID
2 Document Extraction: Get Document Data DocumentData (Recordset)
3 String: Assign DocumentJson (string)
4 Python Script: Open (custom logic) & Python Script: Execute function <normalize data> UpdatedJson (string)
5 Document Extraction: Update document data UpdateOutput (output variable)

JSON Reference

The DocumentJson variable holds the full document record as a JSON object. The Python function receives this object, applies the requested transformations to the relevant fields, and returns the same structure with updated values. Field names and the overall schema must remain unchanged.

Usage Guidelines
  • Use the prompt template provided. Deviating from the four-line structure can produce incomplete output.
  • Keep field names in the prompt consistent with the field names as they appear in the document extraction output.
  • For Medium and more complex tasks, include an example in the prompt to reduce ambiguity.
  • For Complex tasks, review the generated scaffold carefully. Update the Python script to integrate your external data source (CSV, database, API) before running.
  • Your $UpdateOutput$ must always be an output variable at the process level. Do not discard it.
Use the sample JSON to understand the expected schema when writing or reviewing Python transformation logic.
{
	"pages": [
		{
			"width": 1700,
			"height": 2200
		}
	],
	"fields": {
		"VendorID": {
			"value": "10001",
			"bounds": "0,0,0,0"
		},
		"vendor_name": {
			"value": "",
			"bounds": "0,0,0,0"
		},
		"invoice_number": {
			"value": "10280",
			"bounds": "1446,444,72,20"
		}
	},
	"tables": {
		"table": [
			{
				"quantity": {
					"value": "1",
					"bounds": "188,783,13,17"
				},
				"total_price": {
					"value": "22.00",
					"bounds": "1506,784,69,18"
				},
			},
			{
				"quantity": {
					"value": "1",
					"bounds": "188,819,13,17"
				},
				"total_price": {
					"value": "36.75",
					"bounds": "1508,817,65,20"
				}
			}
		]
	}
}