Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

Jiaming Yang; Chenwei Tang; Liangli Zhen; Jiancheng Lv

arXiv:2604.25975·cs.LG·April 30, 2026

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

Jiaming Yang, Chenwei Tang, Liangli Zhen, Jiancheng Lv

PDF

TL;DR

This paper introduces CapKV, a theoretically grounded key-value cache eviction method based on the Information Bottleneck principle, improving long-context generation efficiency in large language models.

Contribution

It derives a mutual information-based objective for cache eviction and proposes CapKV, a capacity-aware method that outperforms heuristic strategies.

Findings

01

CapKV achieves better memory and fidelity trade-offs across models.

02

Theoretical analysis unifies existing eviction strategies under a capacity-maximization framework.

03

Extensive experiments demonstrate CapKV's superior performance on long-context benchmarks.

Abstract

Key-value (KV) caching is essential for large language model inference, yet its memory overhead poses a critical bottleneck for long-context generation. Existing eviction policies predominantly rely on empirical heuristics, lacking a rigorous theoretical foundation. This work rethinks KV cache eviction through the lens of the Information Bottleneck principle. Under a linear-Gaussian surrogate of attention, we derive a closed-form mutual information objective that characterizes the effective information capacity of a retained KV cache subset. This formulation reveals that a wide range of existing eviction strategies can be interpreted as different approximations of the same capacity-maximization principle. Guided by this insight, we introduce CapKV, a capacity-aware eviction method that directly targets information preservation via a log-determinant approximation using statistical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.