MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks
Georgios Syros, Evan Rose, Brian Grinstead, Christoph Kerschbaumer, William Robertson, Cristina Nita-Rotaru, Alina Oprea

TL;DR
MUZZLE is an adaptive framework that automatically evaluates web agents against indirect prompt injection attacks by identifying vulnerabilities and generating context-aware malicious prompts, revealing new attack vectors.
Contribution
It introduces MUZZLE, a novel adaptive attack framework that automates and refines prompt injection attacks on web agents, surpassing prior fixed-template methods.
Findings
Discovered 37 new attacks across 4 web applications.
Identified 2 cross-application prompt injection attacks.
Uncovered an agent-tailored phishing scenario.
Abstract
Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and performing actions on users' behalf. While these agents offer powerful capabilities, their design exposes them to indirect prompt injection attacks embedded in untrusted web content, enabling adversaries to hijack agent behavior and violate user intent. Despite growing awareness of this threat, existing evaluations rely on fixed attack templates, manually selected injection surfaces, or narrowly scoped scenarios, limiting their ability to capture realistic, adaptive attacks encountered in practice. We present MUZZLE, an automated agentic framework for evaluating the security of web agents against indirect prompt injection attacks. MUZZLE utilizes the agent's trajectories to automatically identify high-salience injection surfaces, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Spam and Phishing Detection · Information and Cyber Security
