MalTool: Malicious Tool Attacks on LLM Agents
Yuepeng Hu, Yuqi Jia, Mengyuan Li, Dawn Song, Neil Gong

TL;DR
This paper introduces MalTool, a framework for synthesizing malicious tools with embedded harmful behaviors for LLM agents, revealing significant detection challenges and risks.
Contribution
It presents the first systematic study of malicious tool code behaviors, a taxonomy, and a framework for generating malicious tools to evaluate detection methods.
Findings
MalTool successfully generates diverse malicious tools with specified behaviors.
Existing detection methods are largely ineffective against the synthesized malicious tools.
The study highlights urgent need for new defenses against malicious tools in LLM-agent systems.
Abstract
In a malicious tool attack, an attacker uploads a malicious tool to a distribution platform; once a user inadvertently installs the tool and the LLM agent selects it during task execution, the tool can compromise the user's security and privacy. Prior work focuses on manipulating tool names and descriptions to increase the likelihood of installation by users and selection by LLM agents. However, a successful attack also requires embedding malicious behaviors in the tool's code implementation, which remains largely unexplored. In this work, we bridge this gap by presenting the first systematic study of malicious tool code implementations. We first propose a taxonomy of malicious tool behaviors based on the confidentiality-integrity-availability triad, tailored to LLM-agent settings. To investigate the severity of the risks posed by attackers exploiting coding LLMs to automatically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
