Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents
Kaituo Zhang, Zhen Xiong, Mingyu Zhong, Zhimeng Jiang, Zhouyuan Yuan, Zhecheng Li, Ying Lin

TL;DR
This paper investigates the effectiveness of tool-augmented reasoning in LLM agents, revealing that under semantic noise, the benefits are offset by a 'tool-use tax' from protocol overhead, and proposes G-STEP to mitigate this issue.
Contribution
It introduces the Factorized Intervention Framework to analyze tool-use costs and proposes G-STEP to reduce protocol-induced errors in LLM reasoning.
Findings
Tool-augmented reasoning does not always outperform native CoT under semantic distractors.
The 'tool-use tax' can negate the benefits of tool integration due to protocol overhead.
G-STEP partially mitigates protocol-induced errors, but more improvements are needed.
Abstract
Tool-augmented reasoning has become a popular direction for LLM-based agents, and it is widely assumed to improve reasoning and reliability. However, we demonstrate that this consensus does not always hold: in the presence of semantic distractors, tool-augmented reasoning does not necessarily outperform native CoT. To explain this performance gap, we propose a Factorized Intervention Framework that isolates the cost of prompt formatting, the overhead of the tool-calling protocol, and the actual gain from executing tools. Our analysis reveals a critical tradeoff: under semantic noise, the gains from tools often fail to offset the "tool-use tax", which is the performance degradation introduced by the tool-calling protocol itself. To address this, we introduce G-STEP, a lightweight inference-time gate to mitigate protocol-induced errors. While this yields partial recovery, our findings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
