CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
Luning Wang, Shiyao Li, Xuefei Ning, Zhihang Yuan, Shengen Yan, Guohao, Dai, Yu Wang

TL;DR
CSKV introduces a low-rank, channel-shrinking method for KV cache compression in LLMs, significantly reducing memory overhead with minimal performance loss and low training costs.
Contribution
The paper proposes a novel, training-efficient channel shrinking technique using low-rank decomposition for KV cache compression in long-context LLMs, achieving up to 95% memory reduction.
Findings
Reduces KV cache memory by 80% while maintaining performance
Achieves up to 95% compression when combined with quantization
Maintains long-context capabilities with minimal retraining effort
Abstract
Large Language Models (LLMs) have been widely adopted to process long-context tasks. However, the large memory overhead of the key-value (KV) cache poses significant challenges in long-context scenarios. Existing training-free KV cache compression methods typically focus on quantization and token pruning, which have compression limits, and excessive sparsity can lead to severe performance degradation. Other methods design new architectures with less KV overhead but require significant training overhead. To address the above two drawbacks, we further explore the redundancy in the channel dimension and apply an architecture-level design with minor training costs. Therefore, we introduce CSKV, a training-efficient Channel Shrinking technique for KV cache compression: (1) We first analyze the singular value distribution of the KV cache, revealing significant redundancy and compression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Wireless Network Optimization · Caching and Content Delivery · IPv6, Mobility, Handover, Networks, Security
MethodsFocus
