MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents
Kaijie Zhu, Xianjun Yang, Jindong Wang, Wenbo Guo, William Yang Wang

TL;DR
MELON is a novel defense mechanism against indirect prompt injection attacks in AI agents, using re-execution with masked prompts to detect malicious behavior while maintaining utility.
Contribution
We introduce MELON, a new provable defense against IPI attacks that detects malicious actions by comparing original and masked execution trajectories, outperforming existing methods.
Findings
MELON outperforms state-of-the-art defenses in attack prevention.
MELON maintains higher utility compared to existing defenses.
Combining MELON with prompt augmentation further enhances security.
Abstract
Recent research has explored that LLM agents are vulnerable to indirect prompt injection (IPI) attacks, where malicious tasks embedded in tool-retrieved information can redirect the agent to take unauthorized actions. Existing defenses against IPI have significant limitations: either require essential model training resources, lack effectiveness against sophisticated attacks, or harm the normal utilities. We present MELON (Masked re-Execution and TooL comparisON), a novel IPI defense. Our approach builds on the observation that under a successful attack, the agent's next action becomes less dependent on user tasks and more on malicious tasks. Following this, we design MELON to detect attacks by re-executing the agent's trajectory with a masked user prompt modified through a masking function. We identify an attack if the actions generated in the original and masked executions are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Physical Unclonable Functions (PUFs) and Hardware Security · Radiation Effects in Electronics
