The Verifier Tax: Horizon Dependent Safety Success Tradeoffs in Tool Using LLM Agents

Tanmay Sah; Vishal Srivastava; Dolly Sah; and Kayden Jordan

arXiv:2603.19328·cs.CR·March 23, 2026

The Verifier Tax: Horizon Dependent Safety Success Tradeoffs in Tool Using LLM Agents

Tanmay Sah, Vishal Srivastava, Dolly Sah, and Kayden Jordan

PDF

Open Access

TL;DR

This paper investigates how runtime safety enforcement impacts the performance of large language model agents across different domains, revealing a persistent safety gap and the limitations of current mitigation strategies.

Contribution

It introduces a comprehensive analysis of safety tradeoffs in tool-using LLM agents, highlighting the verifier tax and the need for improved grounded identity verification methods.

Findings

01

Safety mediation intercepts up to 94% of unsafe actions

02

Safe goal attainment remains below 5% in most settings

03

Recovery after blocked actions is generally low

Abstract

We study how runtime enforcement against unsafe actions affects end-to-end task performance in multi-step tool using large language model (LLM) agents. Using tau-bench across Airline and Retail domains, we compare baseline Tool-Calling, planning-integrated (TRIAD), and policy-mediated (TRIAD-SAFETY) architectures with GPT-OSS-20B and GLM-4-9B. We identify model dependent interaction horizons (15 to 30 turns) and decompose outcomes into overall success rate (SR), safe success rate (SSR), and unsafe success rate (USR). Our results reveal a persistent Safety Capability Gap. While safety mediation can intercept up to 94 percent of non-compliant actions, it rarely translates into strictly safe goal attainment (SSR below 5 percent in most settings). We find that high unsafe success rates are primarily driven by Integrity Leaks, where models hallucinate user identifiers to bypass mandatory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education