It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents
Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki, Piotr B{\l}aszczyk, Will Howard, Lukas Aichberger, Chris Russell, Philip H.S. Torr, Adam Mahdi, Adel Bibi

TL;DR
This paper introduces TRAP, a benchmark for evaluating how web-based agents powered by large language models can be misled by prompt injection attacks, revealing systemic vulnerabilities in their task execution.
Contribution
The paper presents the TRAP benchmark and a modular framework to study and measure prompt injection vulnerabilities in web agents across multiple models.
Findings
Agents are susceptible to prompt injection in 25% of tasks on average.
Small interface or contextual changes can double attack success rates.
Vulnerabilities are systemic and psychologically driven.
Abstract
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task. We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks. Across six frontier models, agents are susceptible to prompt injection in 25\% of tasks on average (13\% for GPT-5 to 43\% for DeepSeek-R1), with small interface or contextual changes often doubling success rates and revealing systemic, psychologically driven vulnerabilities in web-based agents. We also provide a modular social-engineering injection framework with controlled…
Peer Reviews
Decision·Submitted to ICLR 2026
- 630 task suites on realistic website clones to measure agent susceptibility. - Verifiable evaluation without reliance on LLM judges. - Interesting findings regarding the vulnerability of different models (hijack success rates ranging from 13% on GPT-5 to 43% on DeepSeek-R1). - Using realistic clones of popular websites from REAL (Garg et al., 2025). This is important as prompt injections are a major threat for agents, and prompt injection benchmarks in realistic environments are highly needed.
- My major concern is that the benchmark has a very low number of unique tasks (only 18). A total of 630 tasks are created by using 35 injection templates (7 persuasion principles × 5 LLM manipulation methods). The benchmark would be more useful with a larger number of unique tasks (say, at least 50 or better 100). Varying injection templates is less interesting, since they shouldn’t be assumed fixed (see the discussion on adaptive attacks in [Jailbreaking Leading Safety-Aligned LLMs with Simple
1. Frames persuasion-driven hijacks as modular components that can be recombined and extended. 2. Clear threat setup; fixed observation modality (AXTree) to control confounders; broad, factorized analysis across persuasion and manipulation methods with transferability measurements. 3. The five-component decomposition, location diagrams, and controlled comparisons (button vs hyperlink; targeted vs non-targeted prompts; tailored vs non-tailored) make the overall story legible.
1. TRAP’s modularity is compelling, but the paper does not empirically contrast against prior agent-security benchmarks (e.g., AgentDojo, AgentHarm, InjecAgent, etc) on overlapping attack types to show what conclusions change when using the one-click criterion. 2. Only text injections via buttons and hyperlinks are used in the core dataset; pop-ups, banners, multimedia, and richer UI elements are out of scope. 3. Using only accessibility trees improves control, but many deployed agents rely on
- Web agent hijacking is a real and growing threat. The focus on systematic evaluation is timely and valuable. - The 5-component framework (interface, persuasion, manipulation, location, tailoring) enables systematic ablation studies and is extensible to new attack types. - The paper systematically examines transferability, component effectiveness, interface types, location, and tailoring, which provides actionable insights.
- Only buttons and hyperlinks are tested; no images, pop-ups, audio, forms, or other realistic attack vectors. Also, one-click criterion is too simplistic, as in practice agent scaffolding can detect recover from errors (e.g., [1] discussed this in detail). Further discussion and justification of the one-click criterion are needed. - No defenses/controls are evaluated. This limits practical applicability. It would be interesting to see if input filters / output monitors can mitigate the problem.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Adversarial Robustness in Machine Learning · Web Application Security Vulnerabilities
