The Verifier Tax: Horizon Dependent Safety Success Tradeoffs in Tool Using LLM Agents
Tanmay Sah, Vishal Srivastava, Dolly Sah, and Kayden Jordan

TL;DR
This paper investigates how runtime safety enforcement impacts the performance of large language model agents across different domains, revealing a persistent safety gap and the limitations of current mitigation strategies.
Contribution
It introduces a comprehensive analysis of safety tradeoffs in tool-using LLM agents, highlighting the verifier tax and the need for improved grounded identity verification methods.
Findings
Safety mediation intercepts up to 94% of unsafe actions
Safe goal attainment remains below 5% in most settings
Recovery after blocked actions is generally low
Abstract
We study how runtime enforcement against unsafe actions affects end-to-end task performance in multi-step tool using large language model (LLM) agents. Using tau-bench across Airline and Retail domains, we compare baseline Tool-Calling, planning-integrated (TRIAD), and policy-mediated (TRIAD-SAFETY) architectures with GPT-OSS-20B and GLM-4-9B. We identify model dependent interaction horizons (15 to 30 turns) and decompose outcomes into overall success rate (SR), safe success rate (SSR), and unsafe success rate (USR). Our results reveal a persistent Safety Capability Gap. While safety mediation can intercept up to 94 percent of non-compliant actions, it rarely translates into strictly safe goal attainment (SSR below 5 percent in most settings). We find that high unsafe success rates are primarily driven by Integrity Leaks, where models hallucinate user identifiers to bypass mandatory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education
