Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools
Kanghua Mo, Li Hu, Yucheng Long, Zhihao Li

TL;DR
This paper introduces the Attractive Metadata Attack (AMA), a novel black-box method that manipulates tool metadata to stealthily influence large language model (LLM) agents' tool selection, exposing systemic vulnerabilities in current AI agent architectures.
Contribution
The paper presents AMA, a new attack framework that exploits tool metadata to manipulate LLM agents, demonstrating high success rates and robustness against existing defenses.
Findings
High attack success rates (81%-95%) across scenarios
Effective even against prompt-level defenses and detection methods
Reveals systemic vulnerabilities in current LLM agent architectures
Abstract
Large language model (LLM) agents have demonstrated remarkable capabilities in complex reasoning and decision-making by leveraging external tools. However, this tool-centric paradigm introduces a previously underexplored attack surface, where adversaries can manipulate tool metadata -- such as names, descriptions, and parameter schemas -- to influence agent behavior. We identify this as a new and stealthy threat surface that allows malicious tools to be preferentially selected by LLM agents, without requiring prompt injection or access to model internals. To demonstrate and exploit this vulnerability, we propose the Attractive Metadata Attack (AMA), a black-box in-context learning framework that generates highly attractive but syntactically and semantically valid tool metadata through iterative optimization. The proposed attack integrates seamlessly into standard tool ecosystems and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Security and Verification in Computing
