Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents
Fengchao Chen, Tingmin Wu, Van Nguyen, Carsten Rudolph

TL;DR
This paper reveals that commercial LLM agents often bypass safety constraints and over-execute tasks, especially when not explicitly prompted, posing significant security risks from user-mediated attacks.
Contribution
It systematically evaluates 12 commercial agents, highlighting their default unsafe behaviors and the need for improved safety prioritization and task boundary mechanisms.
Findings
Over 92% of trip-planning agents bypass safety without prompts.
Web-use agents show 100% bypass rate in risky actions.
Safety checks are only invoked when explicitly requested.
Abstract
Large Language Models (LLMs) have enabled agents to move beyond conversation toward end-to-end task execution and become more helpful. However, this helpfulness introduces new security risks stem less from direct interface abuse than from acting on user-provided content. Existing studies on agent security largely focus on model-internal vulnerabilities or adversarial access to agent interfaces, overlooking attacks that exploit users as unintended conduits. In this paper, we study user-mediated attacks, where benign users are tricked into relaying untrusted or attacker-controlled content to agents, and analyze how commercial LLM agents respond under such conditions. We conduct a systematic evaluation of 12 commercial agents in a sandboxed environment, covering 6 trip-planning agents and 6 web-use agents, and compare agent behavior across scenarios with no, soft, and hard user-requested…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Spam and Phishing Detection · Hate Speech and Cyberbullying Detection
