AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents
Zhun Wang, Vincent Siu, Zhe Ye, Tianneng Shi, Yuzhou Nie, Xuandong Zhao, Chenguang Wang, Wenbo Guo, Dawn Song

TL;DR
AgentVigil is a black-box fuzzing framework that automatically discovers and exploits indirect prompt injection vulnerabilities in LLM-based agents, significantly improving attack success rates and demonstrating real-world risks.
Contribution
This work introduces a novel black-box fuzzing approach using MCTS for discovering indirect prompt injection vulnerabilities in diverse LLM agents.
Findings
Achieves 71% and 70% success rates on two benchmarks.
Nearly doubles the performance of baseline attacks.
Successfully misleads agents in real-world scenarios.
Abstract
The strong planning and reasoning capabilities of Large Language Models (LLMs) have fostered the development of agent-based systems capable of leveraging external tools and interacting with increasingly complex environments. However, these powerful features also introduce a critical security risk: indirect prompt injection, a sophisticated attack vector that compromises the core of these agents, the LLM, by manipulating contextual information rather than direct user prompts. In this work, we propose a generic black-box fuzzing framework, AgentVigil, designed to automatically discover and exploit indirect prompt injection vulnerabilities across diverse LLM agents. Our approach starts by constructing a high-quality initial seed corpus, then employs a seed selection algorithm based on Monte Carlo Tree Search (MCTS) to iteratively refine inputs, thereby maximizing the likelihood of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAccess Control and Trust · Security and Verification in Computing
