Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

Zhifan Luo; Shuo Shao; Su Zhang; Lijing Zhou; Yuke Hu; Chenxu Zhao; Zhihao Liu; Zhan Qin

arXiv:2508.09442·cs.CR·March 25, 2026

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, Zhan Qin

PDF

TL;DR

This paper uncovers privacy vulnerabilities in the KV-cache used in LLM inference, demonstrating attack methods to reconstruct sensitive data and proposing KV-Cloak, a lightweight defense that effectively mitigates these risks without impacting model performance.

Contribution

The paper provides the first comprehensive analysis of KV-cache privacy risks, introduces three novel attack vectors, and proposes KV-Cloak, an efficient obfuscation method to secure the cache.

Findings

01

Attacks can successfully reconstruct sensitive inputs from KV-cache.

02

KV-Cloak effectively prevents reconstruction attacks.

03

Minimal impact on model accuracy and performance overhead.

Abstract

The Key-Value (KV) cache, which stores intermediate attention computations (Key and Value pairs) to avoid redundant calculations, is a fundamental mechanism for accelerating Large Language Model (LLM) inference. However, this efficiency optimization introduces significant yet underexplored privacy risks. This paper provides the first comprehensive analysis of these vulnerabilities, demonstrating that an attacker can reconstruct sensitive user inputs directly from the KV-cache. We design and implement three distinct attack vectors: a direct Inversion Attack, a more broadly applicable and potent Collision Attack, and a semantic-based Injection Attack. These methods demonstrate the practicality and severity of KV-cache privacy leakage issues. To mitigate this, we propose KV-Cloak, a novel, lightweight, and efficient defense mechanism. KV-Cloak uses a reversible matrix-based obfuscation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.