Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective
Jiaming Yang, Chenwei Tang, Liangli Zhen, Jiancheng Lv

TL;DR
This paper introduces CapKV, a theoretically grounded key-value cache eviction method based on the Information Bottleneck principle, improving long-context generation efficiency in large language models.
Contribution
It derives a mutual information-based objective for cache eviction and proposes CapKV, a capacity-aware method that outperforms heuristic strategies.
Findings
CapKV achieves better memory and fidelity trade-offs across models.
Theoretical analysis unifies existing eviction strategies under a capacity-maximization framework.
Extensive experiments demonstrate CapKV's superior performance on long-context benchmarks.
Abstract
Key-value (KV) caching is essential for large language model inference, yet its memory overhead poses a critical bottleneck for long-context generation. Existing eviction policies predominantly rely on empirical heuristics, lacking a rigorous theoretical foundation. This work rethinks KV cache eviction through the lens of the Information Bottleneck principle. Under a linear-Gaussian surrogate of attention, we derive a closed-form mutual information objective that characterizes the effective information capacity of a retained KV cache subset. This formulation reveals that a wide range of existing eviction strategies can be interpreted as different approximations of the same capacity-maximization principle. Guided by this insight, we introduce CapKV, a capacity-aware eviction method that directly targets information preservation via a log-determinant approximation using statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
