Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMs

Aayush Gupta

arXiv:2508.09288·cs.CR·August 20, 2025

Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMs

Aayush Gupta

PDF

TL;DR

This paper introduces Contextual Integrity Verification (CIV), a cryptographically secured architecture that enforces source trust in LLMs at inference time, effectively preventing prompt injection attacks without degrading model performance.

Contribution

CIV is a novel, lightweight security architecture that provides provable, per-token non-interference guarantees for frozen LLMs against prompt injection attacks.

Findings

01

CIV achieves 0% attack success rate on benchmark datasets.

02

CIV maintains 93.1% token-level similarity on benign tasks.

03

CIV introduces minimal latency overhead and requires no fine-tuning.

Abstract

Large language models (LLMs) remain acutely vulnerable to prompt injection and related jailbreak attacks; heuristic guardrails (rules, filters, LLM judges) are routinely bypassed. We present Contextual Integrity Verification (CIV), an inference-time security architecture that attaches cryptographically signed provenance labels to every token and enforces a source-trust lattice inside the transformer via a pre-softmax hard attention mask (with optional FFN/residual gating). CIV provides deterministic, per-token non-interference guarantees on frozen models: lower-trust tokens cannot influence higher-trust representations. On benchmarks derived from recent taxonomies of prompt-injection vectors (Elite-Attack + SoK-246), CIV attains 0% attack success rate under the stated threat model while preserving 93.1% token-level similarity and showing no degradation in model perplexity on benign…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.