Sponge Tool Attack: Stealthy Denial-of-Efficiency against Tool-Augmented Agentic Reasoning
Qi Li, Xinchao Wang

TL;DR
This paper introduces the Sponge Tool Attack (STA), a stealthy method to disrupt tool-augmented reasoning in large language models by rewriting prompts to cause inefficiency without model modification.
Contribution
The paper presents a novel prompt-rewriting attack that exploits tool-augmented reasoning, highlighting a new vulnerability and demonstrating its effectiveness across multiple models and datasets.
Findings
STA significantly increases computational overhead in reasoning tasks.
The attack remains stealthy by preserving original task semantics.
Effective across diverse models, tools, and datasets.
Abstract
Enabling large language models (LLMs) to solve complex reasoning tasks is a key step toward artificial general intelligence. Recent work augments LLMs with external tools to enable agentic reasoning, achieving high utility and efficiency in a plug-and-play manner. However, the inherent vulnerabilities of such methods to malicious manipulation of the tool-calling process remain largely unexplored. In this work, we identify a tool-specific attack surface and propose Sponge Tool Attack (STA), which disrupts agentic reasoning solely by rewriting the input prompt under a strict query-only access assumption. Without any modification on the underlying model or the external tools, STA converts originally concise and efficient reasoning trajectories into unnecessarily verbose and convoluted ones before arriving at the final answer. This results in substantial computational overhead while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Ethics and Social Impacts of AI
