TL;DR
This paper reveals that clarification-seeking behavior in LLM agents significantly increases their vulnerability to prompt injection attacks, challenging assumptions about its security benefits.
Contribution
The study introduces ASPI, a benchmark for measuring how clarification states affect prompt injection vulnerability in LLM agents, highlighting increased risks.
Findings
Clarification-seeking amplifies attack success rates from around 2% to over 35%.
Evaluation under ambiguity reveals higher security risks than standard execution.
Model behavior shifts during clarification impact content processing and vulnerability.
Abstract
Clarification-seeking behavior is widely regarded as a desirable property of LLM agents, enabling them to resolve ambiguity before acting on underspecified tasks. However, the security implications of this interaction pattern remain unexplored. We investigate whether the transition from standard execution to a clarification-seeking state increases an agent's susceptibility to prompt injection attacks. We introduce ASPI (Ambiguous-State Prompt Injection), a benchmark of 728 task-attack scenarios that isolates clarification as a distinct agent state and measures how this state transition affects vulnerability under controlled conditions. Each benchmark instance is evaluated under matched execution and clarification settings: in the execution setting, the agent acts on a fully specified instruction and encounters adversarial content only through tool-returned data; in the clarification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
