Near-Miss: Latent Policy Failure Detection in Agentic Workflows
Ella Rabinovich, David Boaz, Naama Zwerdling, Ateret Anaby-Tavor

TL;DR
This paper introduces a new metric to detect subtle latent policy failures in agentic workflows, revealing issues overlooked by traditional outcome-based evaluations.
Contribution
It extends the ToolGuard framework to analyze agent trajectories for informed tool-calling decisions, exposing latent failures in LLM-based agents.
Findings
Latent failures occur in 8-17% of trajectories with tool calls.
Current evaluation methods miss these subtle policy bypasses.
The new metric uncovers blind spots in policy adherence assessment.
Abstract
Agentic systems for business process automation often require compliance with policies governing conditional updates to the system state. Evaluation of policy adherence in LLM-based agentic workflows is typically performed by comparing the final system state against a predefined ground truth. While this approach detects explicit policy violations, it may overlook a more subtle class of issues in which agents bypass required policy checks, yet reach a correct outcome due to favorable circumstances. We refer to such cases as near-misses or latent failures. In this work, we introduce a novel metric for detecting latent policy failures in agent conversations traces. Building on the ToolGuard framework, which converts natural-language policies into executable guard code, our method analyzes agent trajectories to determine whether agent's tool-calling decisions where sufficiently informed. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
