Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems
Yuji Yamamoto, Satoshi Matsuura

TL;DR
This paper investigates a new security vulnerability in shared KV-cache blocks of LLM serving systems, revealing silent divergence, selective propagation, and persistent damage growth, and proposes a checksum-based detection countermeasure.
Contribution
It identifies a novel bit-flip vulnerability in shared KV-cache blocks of LLM systems and introduces an effective checksum-based detection method.
Findings
Silent divergence affects 13 of 16 BF16 bit positions.
Only requests sharing the targeted prefix are affected.
The proposed checksum detects single-bit corruption with negligible overhead.
Abstract
Rowhammer on GPU DRAM has enabled adversarial bit flips in model weights; shared KV-cache blocks in LLM serving systems present an analogous but previously unexamined target. In vLLM's Prefix Caching, these blocks exist as a single physical copy without integrity protection. Using software fault injection under ideal bit targeting, we characterize worst-case severity and identify three properties: (1) Silent divergence - 13 of 16 BF16 bit positions produce coherent but altered outputs, indistinguishable from legitimate responses without a clean baseline. (2) Selective propagation - only requests sharing the targeted prefix are affected. (3) Persistent accumulation - no temporal decay occurs, so cumulative damage grows linearly with subsequent requests. Together, these constitute a threat profile distinct from weight corruption: silent divergence and selective propagation enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
