Leverage the new Structured data extraction action to capture structured data blocks from web applications built on HTML technology. During runtime, the Recorder identifies objects similar to the one selected and organizes them into rows, while their child elements are mapped as columns.

Note: This action is only available on the Windows platform to build automations.

Prerequisites

Ensure you use Browser extension (version 4.1.0.0 or later).

Action parameters

  • Double-click or drag Recorder > Structured data extraction.
  • Specify the window in which to capture an object. Choose from the Application, Browser, or Variable tab.
    • Application: Select from a list of currently active windows. This option shows a list of all the application and browser windows that are open on the Bot Creator device.
    • Browser: Select from a list of supported browser tabs such as Google Chrome and Chromium-based Microsoft Edge browsers.
    • Variable: Select an existing window variable to specify the title of the application window title.
  • Specify the window in which to capture an object.
  • Click Capture object.

    The selected window appears.

  • Move the mouse over the specified object which has similar other elements.

    A red rectangular box appears around the object.

    Rectangular box around the captured object

  • Click the object to capture.
  • Review the Object properties table.
    Important: We recommend you deselect properties like HTML ID, Path and any other properties that might change with every page. Include properties like DOMXPath and CSS selector instead.
  • From the Data extraction type, select System or Custom.
    • System: This mode automatically detect and extract repeating data patterns from the selected section of the web page. It identifies common fields such as text, images, and hyperlinks, and structures them into columns. This mode is ideal when you want to quickly extract standard data layouts, such as product lists or tables, without configuring each column manually.
    • Custom: Custom mode is best suited for complex or non-standard page layouts that require higher precision. It is especially useful when the captured element contains many child elements, but you need to extract only a select few. Additionally, custom mode ensures that the extracted data remains in a fixed, predefined number of columns.
      Select Run custom extraction to extract all data points from the captured object, including the DOMXPath and its sample value. You can then manually configure or refine the extracted data, edit XPath expressions, rename columns, and add, remove, or rearrange elements as needed. Custom mode is ideal for complex or non-standard page layouts where greater precision is required.
      Important: Using a variable in the Application tab might cause an error when running custom extraction. We recommend that you select the specific window from the drop-down in the Application tab and run the custom extraction without saving the bot.
  • From the Set system time out field, select either Basic or Advanced.
    • Basic: In the Wait for system response (in seconds) field, specify the number of seconds the bot must wait for object control to appear on the application window. This wait time includes the wait time for both page load and object search.
      Note: The timeout specified for the bot to wait for the control to appear on the application window applies only if the window in which the control is present exists. The Recorder first looks for the application window and only then searches for the object inside that window. The default time to search for the window is 30 seconds. Hence, even if you specify the wait time as 5 seconds, it still waits for 30 seconds by default if the window does not exist.

      We recommend that you first use the If > Window exists condition, specify a wait time of zero second and ensure that the application window exists. If the window exists then use the Recorder, specify a wait time of 5 seconds, and run the bot to detect the object.

    • Advanced: Use this option to automate websites that are constantly loading and updating with latest data such as a stocks website. These websites are never technically fully loaded on screen. In such cases, the bot does not need to wait for the web page to load completely and can directly proceed with automating the web page after a certain time.

      In the Wait for browser response (in seconds) field, specify the number of seconds the bot must wait for the browser to load. Select one of the below options:

      • Stop the bot and display an error message: If the web page has not loaded completely within the specified time out, select this option to stop the bot and display an error message.
      • Skip and proceed to the object: Select this option to proceed to the object directly and capture it even if the web page has not loaded completely.

      In the Wait for object response (in seconds) field, specify the number of seconds the bot must wait for object control to appear on the application window.

    • Page has lazy loading: Select this option for pages where data loads dynamically and continues to auto-load. For example, items are loaded on the page as you scroll.
      • Retry attempts: Enter the number of retry attempts you need for checking the new data.
      • Wait time between retries: Enter the number of seconds you want the automation to wait between retry attempts.
  • In the Save the outcome to a variable field, create a Data Table variable to store the output.

    The extracted data is stored in a data table, where similar objects are arranged as rows and their child elements are represented as columns.

Known product behavior

  • Only textual content will be extracted during extraction. Tags like img, input, select, button, script and style will be skipped during extraction.
  • When using the Data Table > Write to file action to save the data generated by the Structured data extraction action into a CSV file, ensure that you select UTF-8 as the encoding.
  • Similar elements within shadow dom are not supported.
  • System is unable to find objects if the original captured object is unavailable.
  • Secure recording is currently not supported.

Use cases

Below are some websites where you can test structured data extraction: