KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs
Jian Chen, Zhuoran Wang, Jiayu Qin, Ming Li, Meng Wang, Changyou Chen, Yin Chen, Qizhen Weng, Yirui Liu

TL;DR
This paper introduces KV-CoRE, a novel SVD-based method to evaluate the data-dependent low-rank compressibility of kv-caches in large language models, providing a large-scale benchmark and insights for efficient, data-aware cache compression.
Contribution
We present KV-CoRE, the first systematic, gradient-free, incremental method for quantifying kv-cache compressibility across models and datasets, enabling large-scale analysis.
Findings
Compressibility varies systematically with model architecture and language.
Normalized Effective Rank correlates with performance loss under compression.
Large-scale benchmark reveals patterns linking data, model, and cache efficiency.
Abstract
Large language models rely on kv-caches to avoid redundant computation during autoregressive decoding, but as context length grows, reading and writing the cache can quickly saturate GPU memory bandwidth. Recent work has explored KV-cache compression, yet most approaches neglect the data-dependent nature of kv-caches and their variation across layers. We introduce KV-CoRE KV-cache Compressibility by Rank Evaluation), an SVD-based method for quantifying the data-dependent low-rank compressibility of kv-caches. KV-CoRE computes the optimal low-rank approximation under the Frobenius norm and, being gradient-free and incremental, enables efficient dataset-level, layer-wise evaluation. Using this method, we analyze multiple models and datasets spanning five English domains and sixteen languages, uncovering systematic patterns that link compressibility to model architecture, training data,…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The paper shifts the focus from weight-based to data-dependent compressibility of KV-caches - Its methodological quality is high—the approach is mathematically grounded in the Eckart–Young–Mirsky theorem, implements a computationally efficient incremental SVD algorithm, and provides clear optimality guarantees under the Frobenius norm. - The work is also clear and well-structured, with precise notation, intuitive illustrations and rigorous yet accessible derivations.
- Although KV-CoRE claims to be computationally efficient, the paper provides no empirical measurements of runtime, memory consumption, or throughput improvements compared to baseline SVD or Cholesky-based methods (e.g., SVD-LLM, Wang et al., 2024). - The method introduces choices such as batch size, covariance update frequency, and rank-selection strategy, yet their effects on accuracy and stability are unexplored. A brief ablation could clarify robustness and guide practical deployment. - Whil
1. NER is experimentally verified to be a successful compressibility predictor, and can be used as an indicator of how much a trunk of KV-cache can be low-rank approximated/compressed. 2. Another major contribution of this work is the insights observed from experiments: KV-compressibility is layer- and data-dependent, and thus KV-cache compression should be layer-wise and data-aware.
1. Introducing or defining metric(s) for compressibility analysis is basically the core of many related (low-rank compression) methods. This paper mainly focuses on compressibility analysis, but does not proceed to really compress the models and demonstrate results of model compression, either. I would consider this work as yet-another-work on metric definition -- the novelty is not significant and the impact is not guaranteed. 2. Or, the authors need to reveal the superiority of the proposed NE
* Interesting insight on using different datasets to incrementally improve the SVD calculation * NER metric is introduced as a principled way to measure compressibility and also make it interpretable by tying it to known metrics such as perplexity and GPT score
* *Keys are consistently more compressible than values* I wish the paper provided some intuition on why this is the case beyond empirical evidence alone. * The evaluation is constrained to a smaller family of models only (fewer than 10B parameters). Also, it is unclear whether techniques like grouped-query attention, which show greater effect in larger models, will affect the measurements done.
1. Provides a data-dependent, SVD-based framework for analyzing KV-cache compressibility with NER. 2. Demonstrates interesting results that there is strong correlation correlation between NER and real performance metrics (perplexity, GPT score). 3. Establishes the first large-scale benchmark for KV-cache compressibility across models and domains.
1. No improvement over previous methods but more like a case study. 2. Delta is not clear from SVD methods.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Network Packet Processing and Optimization
