Review what's new and the fixes and limitations in AI Governance, AI Guardrails, and AI Evaluations for the v.40 release.
Important: These release notes are applicable only to Automation 360 v.40 release for Cloud-Sandbox environment (Build 45794). The v.40 release for Cloud is not yet generally available (GA), so the content on these pages is subject to change until the Cloud GA.

What's new

AI Evaluations deliver governed, actionable performance insights for Agents and Skills

AI Evaluations introduces controlled, metered evaluation of AI Agents and AI Skills with licensing and AI credit consumption tied to entitlement tracking and enforcement for Cloud environments. This capability ensures teams can validate and benchmark AI performance with automated evaluation built into the AI agent development lifecycle. Licensed users have access to the evaluation feature and the automated scores and details via new evaluation pages, available under the AI menu. See, Evaluaciones de IA.

Available for Cloud environments only.

Entitlement & Usage Controls: Requires appropriate licensing (APA Essentials or APA Pro) and AI credits with usage tracking and enforcement.

Automatic & Manual Tools: Built-in support for automatic and manual evaluations using predefined metrics for measuring performance and scoring details.

Detailed Insights: Scores are backed by industry and research metrics, with breakdowns that illuminate expected vs. actual interactions, execution sequences, and behavior patterns.

Flexible Dataset Support: Upload, reuse, or manually define datasets with secure, audit-aligned retention for repeatable evaluation cycles. Max file size is 50 MB. Datasets are retained for 1 year (reset on use).
Note: Upload only available when evaluating AI Skills.

AI Evaluations helps teams optimize quality, reliability, and governance of AI-powered automations and agentic processes before production deployment and post-deployment.

Perform AI Evaluations for AI Skills and AI Agents and view insights in Detailed Evaluation view

The Run Evaluation flow now supports AI Agents. Users can invoke Agent evaluations by using the Evaluations page, or directly from the Agent Editor. You can also view

You can view the evaluations from the Agent Editor and the Evaluations landing page. A summary is available for overall Evaluation. Closer investigation is available by selecting evaluation details on the page. This provides a summary of the scores of the executed data set. Detailed view for each data set execution is available through Agent output details. Some of these details include:
  • Metrics
  • Scores
  • Reasoning
Event logs and data retention policy for AI Evaluations

When AI Evaluations is run, an Event log is created in AI Governance for audit purposes. Data from AI Evaluations includes date and user info for security and control over versions and modifications. Storage and retention of this data adhere to the existing retention policy as per our platform framework. See, Política de retención de datos.

AI Agent Audit logs now available in AI Governance
Complete visibility and traceability of AI Agent activities and interactions with LLM models for governance and compliance auditing. Ensures compliance with security policies and responsible AI governance requirements through comprehensive audit trails.
  • Track all agent executions from start to completion with detailed input/output logging.
  • Monitor LLM interactions, tool calls, and system responses in real-time.
  • 180-day log retention with drill-down capabilities for investigation.

What's changed

Expanded AI Governance logging for system prompts with Toxicity visibility

AI Governance now captures system prompt details and toxicity scores in Prompt logs and Event logs, even when user prompts are blocked by AI Guardrails. When either system or user prompts exceed configured thresholds per guardrail policy for toxicity, the block is applied, and both system and user prompt toxicity levels are recorded in logs.

This enhanced visibility clarifies why prompts were blocked and supports scoring and analysis of system prompt toxicity alongside user inputs, improving auditability and alignment with guardrail policies for safer and more transparent automation behavior.

AI Guardrails masking functionality now supports additional entities and expanded regional language

Enhancements strengthen data loss prevention (DLP) controls by broadening entity coverage and enabling reliable masking across additional global languages. Masking and unmasking operations are fully functional across all three sensitive data categories (PII, PCI, PHI). See full list, Enmascaramiento de datos en IA.

AI Guardrails now supports masking and unmasking for the following languages: Russian, Hindi, Japanese, Korean, Mandarin (Traditional Chinese), and Portuguese.

Fixes

AI Prompt logs display beyond 1000 records in AI Governance, as expected. Previously, records would not load.

Limitations

In Arabic, masking is partially supported. Some entities might not be detected or masked consistently.