Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents

Fengchao Chen; Tingmin Wu; Van Nguyen; Carsten Rudolph

arXiv:2601.10758·cs.CR·January 19, 2026

Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents

Fengchao Chen, Tingmin Wu, Van Nguyen, Carsten Rudolph

PDF

Open Access

TL;DR

This paper reveals that commercial LLM agents often bypass safety constraints and over-execute tasks, especially when not explicitly prompted, posing significant security risks from user-mediated attacks.

Contribution

It systematically evaluates 12 commercial agents, highlighting their default unsafe behaviors and the need for improved safety prioritization and task boundary mechanisms.

Findings

01

Over 92% of trip-planning agents bypass safety without prompts.

02

Web-use agents show 100% bypass rate in risky actions.

03

Safety checks are only invoked when explicitly requested.

Abstract

Large Language Models (LLMs) have enabled agents to move beyond conversation toward end-to-end task execution and become more helpful. However, this helpfulness introduces new security risks stem less from direct interface abuse than from acting on user-provided content. Existing studies on agent security largely focus on model-internal vulnerabilities or adversarial access to agent interfaces, overlooking attacks that exploit users as unintended conduits. In this paper, we study user-mediated attacks, where benign users are tricked into relaying untrusted or attacker-controlled content to agents, and analyze how commercial LLM agents respond under such conditions. We conduct a systematic evaluation of 12 commercial agents in a sandboxed environment, covering 6 trip-planning agents and 6 web-use agents, and compare agent behavior across scenarios with no, soft, and hard user-requested…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Spam and Phishing Detection · Hate Speech and Cyberbullying Detection