AdvAgent: Controllable Blackbox Red-teaming on Web Agents
Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, Bo Li

TL;DR
AdvAgent introduces a reinforcement learning-based black-box red-teaming framework that effectively uncovers vulnerabilities in web agents, revealing significant security risks and exposing the limitations of current defenses.
Contribution
This work presents a novel RL-based adversarial prompt generator for black-box attacks on web agents, demonstrating high success rates and exposing weaknesses in existing defenses.
Findings
High success rates against GPT-4-based web agents
Existing defenses offer limited protection
Revealed critical vulnerabilities in current web agents
Abstract
Foundation model-based agents are increasingly used to automate complex tasks, enhancing efficiency and productivity. However, their access to sensitive resources and autonomous decision-making also introduce significant security risks, where successful attacks could lead to severe consequences. To systematically uncover these vulnerabilities, we propose AdvAgent, a black-box red-teaming framework for attacking web agents. Unlike existing approaches, AdvAgent employs a reinforcement learning-based pipeline to train an adversarial prompter model that optimizes adversarial prompts using feedback from the black-box agent. With careful attack design, these prompts effectively exploit agent weaknesses while maintaining stealthiness and controllability. Extensive evaluations demonstrate that AdvAgent achieves high success rates against state-of-the-art GPT-4-based web agents across diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Internet Traffic Analysis and Secure E-voting
MethodsDirect Preference Optimization
