AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents

Zhun Wang; Vincent Siu; Zhe Ye; Tianneng Shi; Yuzhou Nie; Xuandong Zhao; Chenguang Wang; Wenbo Guo; Dawn Song

arXiv:2505.05849·cs.CR·June 17, 2025

AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents

Zhun Wang, Vincent Siu, Zhe Ye, Tianneng Shi, Yuzhou Nie, Xuandong Zhao, Chenguang Wang, Wenbo Guo, Dawn Song

PDF

Open Access

TL;DR

AgentVigil is a black-box fuzzing framework that automatically discovers and exploits indirect prompt injection vulnerabilities in LLM-based agents, significantly improving attack success rates and demonstrating real-world risks.

Contribution

This work introduces a novel black-box fuzzing approach using MCTS for discovering indirect prompt injection vulnerabilities in diverse LLM agents.

Findings

01

Achieves 71% and 70% success rates on two benchmarks.

02

Nearly doubles the performance of baseline attacks.

03

Successfully misleads agents in real-world scenarios.

Abstract

The strong planning and reasoning capabilities of Large Language Models (LLMs) have fostered the development of agent-based systems capable of leveraging external tools and interacting with increasingly complex environments. However, these powerful features also introduce a critical security risk: indirect prompt injection, a sophisticated attack vector that compromises the core of these agents, the LLM, by manipulating contextual information rather than direct user prompts. In this work, we propose a generic black-box fuzzing framework, AgentVigil, designed to automatically discover and exploit indirect prompt injection vulnerabilities across diverse LLM agents. Our approach starts by constructing a high-quality initial seed corpus, then employs a seed selection algorithm based on Monte Carlo Tree Search (MCTS) to iteratively refine inputs, thereby maximizing the likelihood of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAccess Control and Trust · Security and Verification in Computing