KVSlimmer: Theoretical Insights and Practical Optimizations for Asymmetric KV Merging

Lianjun Liu; Hongli An; Weiqi Yan; Xin Du; Shengchuan Zhang; Huazhong Liu; Yunshan Zhong

arXiv:2603.00907·cs.CL·March 10, 2026

KVSlimmer: Theoretical Insights and Practical Optimizations for Asymmetric KV Merging

Lianjun Liu, Hongli An, Weiqi Yan, Xin Du, Shengchuan Zhang, Huazhong Liu, Yunshan Zhong

PDF

Open Access

TL;DR

KVSlimmer introduces a theoretically grounded, gradient-free algorithm for efficient asymmetric KV merging in large language models, significantly reducing memory and latency while improving performance.

Contribution

It provides a novel theoretical framework for KV asymmetry and develops an exact, gradient-free algorithm that outperforms existing methods in practical benchmarks.

Findings

01

Outperforms SOTA methods on multiple benchmarks.

02

Reduces memory costs by 29% and latency by 28%.

03

Improves Llama3.1-8B-Instruct LongBench score by 0.92.

Abstract

The growing computational and memory demands of the Key-Value (KV) cache significantly limit the ability of Large Language Models (LLMs). While KV merging has emerged as a promising solution, existing methods that rely on empirical observations of KV asymmetry and gradient-based Hessian approximations lack a theoretical foundation and incur suboptimal compression and inference overhead. To bridge these gaps, we establish a theoretical framework that characterizes this asymmetry through the spectral energy distribution of projection weights, demonstrating that concentrated spectra in Query/Key weights induce feature homogeneity, whereas dispersed spectra in Value weights preserve heterogeneity. Then, we introduce KVSlimmer, an efficient algorithm that captures exact Hessian information through a mathematically exact formulation, and derives a closed-form solution utilizing only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Natural Language Processing Techniques · Big Data and Digital Economy