TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation
Hengkai Ye, Zhechang Zhang, Jinyuan Jia, Hong Hu

TL;DR
TRUSTDESC is a framework that automatically generates trusted, implementation-faithful tool descriptions for LLMs to prevent tool poisoning attacks, improving safety and task success.
Contribution
It introduces a novel three-stage pipeline combining static analysis, synthesis, and dynamic verification to produce trustworthy tool descriptions for LLM applications.
Findings
TRUSTDESC improves task completion rates with accurate descriptions.
It effectively mitigates implicit tool poisoning attacks.
The framework incurs minimal additional time and cost.
Abstract
Large language models (LLMs) increasingly rely on external tools to perform time-sensitive tasks and real-world actions. While tool integration expands LLM capabilities, it also introduces a new prompt-injection attack surface: tool poisoning attacks (TPAs). Attackers manipulate tool descriptions by embedding malicious instructions (explicit TPAs) or misleading claims (implicit TPAs) to influence model behavior and tool selection. Existing defenses mainly detect anomalous instructions and remain ineffective against implicit TPAs. In this paper, we present TRUSTDESC, the first framework for preventing tool poisoning by automatically generating trusted tool descriptions from implementations. TRUSTDESC derives implementation-faithful descriptions through a three-stage pipeline. SliceMin performs reachability-aware static analysis and LLM-guided debloating to extract minimal tool-relevant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
