Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents

Kaituo Zhang; Zhen Xiong; Mingyu Zhong; Zhimeng Jiang; Zhouyuan Yuan; Zhecheng Li; Ying Lin

arXiv:2605.00136·cs.AI·May 4, 2026

Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents

Kaituo Zhang, Zhen Xiong, Mingyu Zhong, Zhimeng Jiang, Zhouyuan Yuan, Zhecheng Li, Ying Lin

PDF

TL;DR

This paper investigates the effectiveness of tool-augmented reasoning in LLM agents, revealing that under semantic noise, the benefits are offset by a 'tool-use tax' from protocol overhead, and proposes G-STEP to mitigate this issue.

Contribution

It introduces the Factorized Intervention Framework to analyze tool-use costs and proposes G-STEP to reduce protocol-induced errors in LLM reasoning.

Findings

01

Tool-augmented reasoning does not always outperform native CoT under semantic distractors.

02

The 'tool-use tax' can negate the benefits of tool integration due to protocol overhead.

03

G-STEP partially mitigates protocol-induced errors, but more improvements are needed.

Abstract

Tool-augmented reasoning has become a popular direction for LLM-based agents, and it is widely assumed to improve reasoning and reliability. However, we demonstrate that this consensus does not always hold: in the presence of semantic distractors, tool-augmented reasoning does not necessarily outperform native CoT. To explain this performance gap, we propose a Factorized Intervention Framework that isolates the cost of prompt formatting, the overhead of the tool-calling protocol, and the actual gain from executing tools. Our analysis reveals a critical tradeoff: under semantic noise, the gains from tools often fail to offset the "tool-use tax", which is the performance degradation introduced by the tool-calling protocol itself. To address this, we introduce G-STEP, a lightweight inference-time gate to mitigate protocol-induced errors. While this yields partial recovery, our findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.