The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

Manish Bhatt; Sarthak Munshi; Vineeth Sai Narajala; Idan Habler; Ammar Al-Kahfah; Ken Huang; Joel Webb; Blake Gatto; Md Tamjidul Hoque

arXiv:2604.06436·cs.CR·April 14, 2026

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

Manish Bhatt, Sarthak Munshi, Vineeth Sai Narajala, Idan Habler, Ammar Al-Kahfah, Ken Huang, Joel Webb, Blake Gatto, Md Tamjidul Hoque

PDF

TL;DR

This paper proves fundamental limitations of continuous, utility-preserving prompt injection defenses for language models, establishing a trilemma that such defenses cannot simultaneously achieve safety, utility, and completeness.

Contribution

It introduces a formal framework demonstrating the inherent failure of certain defense wrappers, extending the results to various settings and verifying the theory mechanically and empirically.

Findings

01

No continuous, utility-preserving wrapper can make all outputs strictly safe.

02

A positive-measure unsafe region persists under certain conditions.

03

The results are validated both mechanically in Lean 4 and empirically on three LLMs.

Abstract

We prove that no continuous, utility-preserving wrapper defense-a function $D : X \to X$ that preprocesses inputs before the model sees them-can make all outputs strictly safe for a language model with connected prompt space, and we characterize exactly where every such defense must fail. We establish three results under successively stronger hypotheses: boundary fixation-the defense must leave some threshold-level inputs unchanged; an $ϵ$ -robust constraint-under Lipschitz regularity, a positive-measure band around fixed boundary points remains near-threshold; and a persistent unsafe region under a transversality condition, a positive-measure subset of inputs remains strictly unsafe. These constitute a defense trilemma: continuity, utility preservation, and completeness cannot coexist. We prove parallel discrete results requiring no topology, and extend to multi-turn interactions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.