Generative Recorder - Vision-based fallback

The vision-based fallback in the Generative Recorder is designed to increase automation resiliency by using vision models as an additional fallback mechanism. Vision-based fallback improves fallback efficacy and provide benefits such as providing business continuity, minimizing maintenance effort and adhering to organization SLAs.

A Vision model AI, often referred to as a computer vision model, is an artificial intelligence system designed to interpret, analyze, and understand visual data (such as images or videos) through advanced machine learning techniques. A vision-based fallback is a mechanism used in automation processes to increase resiliency and reduce execution failures. It involves using vision models to identify changes and update in real-time, ensuring that automation tasks continue smoothly even when unexpected changes occur.
Note: Generative Recorder, text-based fallback, and native fallback do not use any recommendations from your Automator AI quota. Generative Recorder’s vision-based fallback, however, consumes one recommendation per fallback, but only when the fallback is actually triggered at runtime. The number of automations with vision-based fallback enabled does not affect your quota. Recommendations are only deducted when vision-based fallback is activated during execution.

Capabilities

Generative Recorder leverages our automation-tuned ensemble models to achieve deep visual understanding of business applications.

Vision-based fallback can:
  • Accurately identify modified UI structures that traditional methods might miss.
  • Adapt to layout and design changes without manual intervention.
  • Enhance automation efficacy by preventing failures.

For information about available features, see Generative Recorder.

To enable the vision-based fallback:
  1. Log in as a Bot Creator.
  2. From the Bot editor, navigate to Advanced settings > Package settings.
  3. In the Package settings > Recorder, enable the Generative AI vision-based fallback.

Vision-based fallback selection settings

Image sanitization in vision-based fallback

Note: Vision-based fallback might not function correctly if a separate Python installation exists on your system, as this can cause image masking failures. Specifically, the embedded Python used by image masking is extracted only during bot execution and does not interfere; however, any additional Python installation visible in Control Panel > Programs and Features can impact vision-based fallback operations. To ensure reliable performance of vision-based fallback, uninstall any such Python installation found in Programs and Features.

To mitigate data security and privacy risks, the Generative Recorder performs image sanitization locally on your machine before any data is sent outside your environment. This process is handled entirely by the Recorder package running on the device.

The Choose image sanitization method setting allows you to select how and where this sanitization takes place. Following options are available for sanitization:
  • Cloud based sanitization: Screenshots of the target application are securely sent to the Automation Anywhere Cloud Service. Once received, the images are automatically sanitized on the cloud before being processed by the AI model for analysis.

    You can choose this option if you prefer centralized processing for improved performance and minimal impact on local device performance.

  • Local sanitization: The sanitization of screenshots happens directly on your device before any image is sent for AI analysis.

    You can choose this option if your organization prioritizes local data handling, regulatory compliance, or restricted network environments.

During the sanitization process, all business data visible in the captured application image is redacted. This includes not just personally identifiable information (PII) but also includes any sensitive business content that appears on the screen.

Only after this comprehensive local sanitization is complete, the resulting image and extracted text is sent to the region-based AI service for further processing. At no time is raw or unsanitized image data shared beyond your environment.