KVSlimmer: Theoretical Insights and Practical Optimizations for Asymmetric KV Merging
Lianjun Liu, Hongli An, Weiqi Yan, Xin Du, Shengchuan Zhang, Huazhong Liu, Yunshan Zhong

TL;DR
KVSlimmer introduces a theoretically grounded, gradient-free algorithm for efficient asymmetric KV merging in large language models, significantly reducing memory and latency while improving performance.
Contribution
It provides a novel theoretical framework for KV asymmetry and develops an exact, gradient-free algorithm that outperforms existing methods in practical benchmarks.
Findings
Outperforms SOTA methods on multiple benchmarks.
Reduces memory costs by 29% and latency by 28%.
Improves Llama3.1-8B-Instruct LongBench score by 0.92.
Abstract
The growing computational and memory demands of the Key-Value (KV) cache significantly limit the ability of Large Language Models (LLMs). While KV merging has emerged as a promising solution, existing methods that rely on empirical observations of KV asymmetry and gradient-based Hessian approximations lack a theoretical foundation and incur suboptimal compression and inference overhead. To bridge these gaps, we establish a theoretical framework that characterizes this asymmetry through the spectral energy distribution of projection weights, demonstrating that concentrated spectra in Query/Key weights induce feature homogeneity, whereas dispersed spectra in Value weights preserve heterogeneity. Then, we introduce KVSlimmer, an efficient algorithm that captures exact Hessian information through a mathematically exact formulation, and derives a closed-form solution utilizing only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Natural Language Processing Techniques · Big Data and Digital Economy
