DECEPTICON: How Dark Patterns Manipulate Web Agents
Phil Cuvin, Hao Zhu, Diyi Yang

TL;DR
This paper introduces DECEPTICON, an environment for testing dark patterns on web agents, revealing their high effectiveness in manipulating agent behavior and exposing vulnerabilities in current defenses.
Contribution
We present DECEPTICON, a comprehensive testing environment with 700 tasks to evaluate dark pattern influence on web agents, highlighting their significant manipulation capabilities.
Findings
Dark patterns steer agents towards malicious outcomes over 70% of the time.
Larger, more capable models are more susceptible to dark pattern manipulation.
Current countermeasures like in-context prompting are ineffective against dark patterns.
Abstract
Deceptive UI designs, widely instantiated across the web and commonly known as dark patterns, manipulate users into performing actions misaligned with their goals. In this paper, we show that dark patterns are highly effective in steering agent trajectories, posing a significant risk to agent robustness. To quantify this risk, we introduce DECEPTICON, an environment for testing individual dark patterns in isolation. DECEPTICON includes 700 web navigation tasks with dark patterns -- 600 generated tasks and 100 real-world tasks, designed to measure instruction-following success and dark pattern effectiveness. Across state-of-the-art agents, we find dark patterns successfully steer agent trajectories towards malicious outcomes in over 70% of tested generated and real-world tasks -- compared to a human average of 31%. Moreover, we find that dark pattern effectiveness correlates positively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Spam and Phishing Detection
