Review what's new and the fixes and limitations in AI Governance, AI Guardrails, and AI Evaluations for the v.40 release.

What's new

AI Evaluations deliver governed, actionable performance insights for Agents and Skills

AI Evaluations introduces controlled, metered evaluation of AI Agents and AI Skills with licensing and AI credit consumption tied to entitlement tracking and enforcement for Cloud environments. This capability ensures teams can validate and benchmark AI performance with automated evaluation built into the AI agent development lifecycle. Licensed users have access to the evaluation feature and the automated scores and details via new evaluation pages, available under the AI menu. See, AI Evaluations.

Available for Cloud environments only.

Entitlement & Usage Controls: Requires appropriate licensing (APA Essentials or APA Pro) and AI credits with usage tracking and enforcement.

Automatic & Manual Tools: Built-in support for automatic and manual evaluations using predefined metrics for measuring performance and scoring details.

Detailed Insights: Scores are backed by industry and research metrics, with breakdowns that illuminate expected vs. actual interactions, execution sequences, and behavior patterns.

Flexible Dataset Support: Upload, reuse, or manually define datasets with secure, audit-aligned retention for repeatable evaluation cycles. Max file size is 50 MB. Datasets are retained for 1 year (reset on use).
Note: Upload only available when evaluating AI Skills.

AI Evaluations helps teams optimize quality, reliability, and governance of AI-powered automations and agentic processes before production deployment and post-deployment.

AI Agent Audit logs now available in AI Governance
Complete visibility and traceability of AI Agent activities and interactions with LLM models for governance and compliance auditing. Ensures compliance with security policies and responsible AI governance requirements through comprehensive audit trails.
  • Track all agent executions from start to completion with detailed input/output logging.
  • Monitor LLM interactions, tool calls, and system responses in real-time.
  • 180-day log retention with drill-down capabilities for investigation.

Fixes

AI Prompt logs display beyond 1000 records in AI Governance, as expected. Previously, records would not load.

Limitations

In Arabic, masking is partially supported. Some entities might not be detected or masked consistently.