AdvAgent: Controllable Blackbox Red-teaming on Web Agents

Chejian Xu; Mintong Kang; Jiawei Zhang; Zeyi Liao; Lingbo Mo; Mengqi Yuan; Huan Sun; Bo Li

arXiv:2410.17401·cs.CR·June 3, 2025

AdvAgent: Controllable Blackbox Red-teaming on Web Agents

Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, Bo Li

PDF

Open Access

TL;DR

AdvAgent introduces a reinforcement learning-based black-box red-teaming framework that effectively uncovers vulnerabilities in web agents, revealing significant security risks and exposing the limitations of current defenses.

Contribution

This work presents a novel RL-based adversarial prompt generator for black-box attacks on web agents, demonstrating high success rates and exposing weaknesses in existing defenses.

Findings

01

High success rates against GPT-4-based web agents

02

Existing defenses offer limited protection

03

Revealed critical vulnerabilities in current web agents

Abstract

Foundation model-based agents are increasingly used to automate complex tasks, enhancing efficiency and productivity. However, their access to sensitive resources and autonomous decision-making also introduce significant security risks, where successful attacks could lead to severe consequences. To systematically uncover these vulnerabilities, we propose AdvAgent, a black-box red-teaming framework for attacking web agents. Unlike existing approaches, AdvAgent employs a reinforcement learning-based pipeline to train an adversarial prompter model that optimizes adversarial prompts using feedback from the black-box agent. With careful attack design, these prompts effectively exploit agent weaknesses while maintaining stealthiness and controllability. Extensive evaluations demonstrate that AdvAgent achieves high success rates against state-of-the-art GPT-4-based web agents across diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Internet Traffic Analysis and Secure E-voting

MethodsDirect Preference Optimization