AI Agents May Always Fall for Prompt Injections

Sahar Abdelnabi; Eugene Bagdasarian

arXiv:2605.17634·cs.CR·May 19, 2026

AI Agents May Always Fall for Prompt Injections

Sahar Abdelnabi, Eugene Bagdasarian

PDF

TL;DR

Prompt injection poses a critical vulnerability in AI agents, and current defenses are insufficient; a new framework based on Contextual Integrity offers a principled way to evaluate and improve security.

Contribution

Reframes prompt injection attacks using Contextual Integrity theory, revealing fundamental limitations of existing defenses and proposing a new evaluation framework for context-sensitive failures.

Findings

01

Current defenses fail against contextual manipulation attacks.

02

An impossibility result shows defenders cannot perfectly block malicious flows.

03

CI-based framework enables principled evaluation and alignment of AI agents.

Abstract

Prompt injection is the most critical vulnerability in deployed AI agents. Despite recent progress, we show that the prevailing defense paradigm (data-instruction separation) both fails to detect attacks that operate through contextual manipulation and degrades contextually appropriate behavior. We then recast prompt injection via the lens of Contextual Integrity (CI), a privacy theory that judges information flow compliance with contextual norms. This explains types of attacks that current defenses attempt to patch and predict advanced ones future agents will face. We develop unique benign and attack scenarios that force an agent to violate the norms by (1) misrepresenting the flow, (2) manipulating norms, or (3) mixing multiple flows. This reframing suggests an impossibility result: an adversary can always construct a context under which a blocked flow appears legitimate, or a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.