MPCache: MPC-Friendly KV Cache Eviction for Efficient Private LLM Inference

Wenxuan Zeng; Ye Dong; Jinjin Zhou; Jin Tan; Lei Wang; Tao Wei; Runsheng Wang; Meng Li

arXiv:2501.06807·cs.CR·October 21, 2025

MPCache: MPC-Friendly KV Cache Eviction for Efficient Private LLM Inference

Wenxuan Zeng, Ye Dong, Jinjin Zhou, Jin Tan, Lei Wang, Tao Wei, Runsheng Wang, Meng Li

PDF

Open Access 1 Video

TL;DR

MPCache introduces an MPC-friendly KV cache eviction framework that reduces latency and communication overhead in private LLM inference by selectively discarding unimportant cache entries and activating only relevant ones.

Contribution

It proposes a novel MPC-compatible KV cache eviction method combining static and dynamic algorithms with multiple optimizations, improving efficiency for private LLM inference.

Findings

01

Achieves 1.8 to 2.01x latency reduction

02

Achieves 3.39 to 8.37x communication reduction

03

Outperforms prior cache eviction baselines across tasks

Abstract

Private large language model (LLM) inference based on secure multi-party computation (MPC) achieves formal data privacy protection but suffers from significant latency overhead, especially for long input sequences. While key-value (KV) cache eviction and sparse attention algorithms have been proposed for efficient LLM inference in plaintext, they are not designed for MPC and cannot benefit private LLM inference directly. In this paper, we propose an accurate and MPC-friendly KV cache eviction framework, dubbed MPCache, building on the observation that historical tokens in a long sequence may have different effects on the downstream decoding. Hence, MPCache combines a look-once static eviction algorithm to discard unimportant KV cache and a query-aware dynamic selection algorithm to activate only a small subset of KV cache for attention computation. MPCache further incorporates a series…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MPCache: MPC-Friendly KV Cache Eviction for Efficient Private LLM Inference· slideslive

Taxonomy

TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques