Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

Kexin Chu; Zecheng Lin; Dawei Xiang; Zixu Shen; Jianchang Su; Cheng Chu; Yiwei Yang; Wenhui Zhang; Wenfei Wu; Wei Zhang

arXiv:2508.08438·cs.CR·February 11, 2026

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

Kexin Chu, Zecheng Lin, Dawei Xiang, Zixu Shen, Jianchang Su, Cheng Chu, Yiwei Yang, Wenhui Zhang, Wenfei Wu, Wei Zhang

PDF

Open Access

TL;DR

SafeKV is a system that enables privacy-preserving shared KV-cache management in large language model inference, reducing timing side-channel risks while maintaining high performance.

Contribution

It introduces a novel co-designed system with detection, isolation, and safeguards to prevent sensitive data leakage in shared KV-cache environments.

Findings

01

Reduces TTFT overhead by up to 40.58% compared to full isolation.

02

Increases throughput by up to 2.66x while maintaining privacy.

03

Effectively mitigates timing side-channels in multi-tenant LLM inference.

Abstract

Global KV-cache sharing is an effective optimization for accelerating large language model (LLM) inference, yet it introduces an API-visible timing side channel that lets adversaries infer sensitive user inputs from shared entries, leading to cross-tenant privacy risks. To address this problem, we introduce SafeKV (Secure and Flexible KV-cache Sharing), a system-level co-design of privacy enforcement and KV-cache management. SafeKV integrates lightweight detection and isolation directly into the serving runtime to eliminate cross-tenant reuse of sensitive KV-cache blocks under our threat model, while recovering most of the performance benefits of global sharing. Our key contributions are: (1) a three-tier asynchronous detection pipeline that decouples privacy classification from inference and supports streaming workloads, (2) a unified radix-tree-based memory manager with path…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Parallel Computing and Optimization Techniques