Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMs
Aayush Gupta

TL;DR
This paper introduces Contextual Integrity Verification (CIV), a cryptographically secured architecture that enforces source trust in LLMs at inference time, effectively preventing prompt injection attacks without degrading model performance.
Contribution
CIV is a novel, lightweight security architecture that provides provable, per-token non-interference guarantees for frozen LLMs against prompt injection attacks.
Findings
CIV achieves 0% attack success rate on benchmark datasets.
CIV maintains 93.1% token-level similarity on benign tasks.
CIV introduces minimal latency overhead and requires no fine-tuning.
Abstract
Large language models (LLMs) remain acutely vulnerable to prompt injection and related jailbreak attacks; heuristic guardrails (rules, filters, LLM judges) are routinely bypassed. We present Contextual Integrity Verification (CIV), an inference-time security architecture that attaches cryptographically signed provenance labels to every token and enforces a source-trust lattice inside the transformer via a pre-softmax hard attention mask (with optional FFN/residual gating). CIV provides deterministic, per-token non-interference guarantees on frozen models: lower-trust tokens cannot influence higher-trust representations. On benchmarks derived from recent taxonomies of prompt-injection vectors (Elite-Attack + SoK-246), CIV attains 0% attack success rate under the stated threat model while preserving 93.1% token-level similarity and showing no degradation in model perplexity on benign…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
