Semantic Integrity Matters: Benchmarking and Preserving High-Density Reasoning in KV Cache Compression
Xiang Liu, Zhenheng Tang, Hong Chen, Peijie Dong, Zeyu Li, Xiuze Zhou, Bo Li, Xuming Hu, Xiaowen Chu

TL;DR
This paper introduces KVFundaBench to evaluate KV cache compression's impact on reasoning tasks, revealing significant degradation in reasoning coherence and proposing ShotKV to preserve semantic integrity, improving accuracy and efficiency.
Contribution
It systematically benchmarks the effects of cache compression on reasoning tasks and proposes ShotKV, a method to preserve semantic units, enhancing reasoning accuracy and reducing latency.
Findings
Reasoning tasks suffer severe degradation under aggressive cache compression.
Specialized attention patterns in DeepSeek-R1 reveal reasoning chain fragility.
ShotKV improves long-context generation accuracy by 9-18% and reduces latency by 11%.
Abstract
While Key-Value (KV) cache compression is essential for efficient LLM inference, current evaluations disproportionately focus on sparse retrieval tasks, potentially masking the degradation of High-Density Reasoning where Chain-of-Thought (CoT) coherence is critical. We introduce KVFundaBench to systematically evaluate this gap, revealing a sharp dichotomy: while retrieval tasks remain robust, reasoning tasks exhibit severe Task-Dependent Degradation under aggressive compression due to disrupted CoT links. Extending our analysis to the DeepSeek-R1 model, we uncover that its specialized attention patterns offer unique insights into the fragility of reasoning chains. Guided by these findings -- specifically the necessity of preserving few-shot examples as indivisible Semantic Units -- we propose ShotKV. This approach explicitly separates prefill and decoding phases to prioritize semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
