Semantic Integrity Matters: Benchmarking and Preserving High-Density Reasoning in KV Cache Compression

Xiang Liu; Zhenheng Tang; Hong Chen; Peijie Dong; Zeyu Li; Xiuze Zhou; Bo Li; Xuming Hu; Xiaowen Chu

arXiv:2502.01941·cs.CL·May 13, 2026

Semantic Integrity Matters: Benchmarking and Preserving High-Density Reasoning in KV Cache Compression

Xiang Liu, Zhenheng Tang, Hong Chen, Peijie Dong, Zeyu Li, Xiuze Zhou, Bo Li, Xuming Hu, Xiaowen Chu

PDF

TL;DR

This paper introduces KVFundaBench to evaluate KV cache compression's impact on reasoning tasks, revealing significant degradation in reasoning coherence and proposing ShotKV to preserve semantic integrity, improving accuracy and efficiency.

Contribution

It systematically benchmarks the effects of cache compression on reasoning tasks and proposes ShotKV, a method to preserve semantic units, enhancing reasoning accuracy and reducing latency.

Findings

01

Reasoning tasks suffer severe degradation under aggressive cache compression.

02

Specialized attention patterns in DeepSeek-R1 reveal reasoning chain fragility.

03

ShotKV improves long-context generation accuracy by 9-18% and reduces latency by 11%.

Abstract

While Key-Value (KV) cache compression is essential for efficient LLM inference, current evaluations disproportionately focus on sparse retrieval tasks, potentially masking the degradation of High-Density Reasoning where Chain-of-Thought (CoT) coherence is critical. We introduce KVFundaBench to systematically evaluate this gap, revealing a sharp dichotomy: while retrieval tasks remain robust, reasoning tasks exhibit severe Task-Dependent Degradation under aggressive compression due to disrupted CoT links. Extending our analysis to the DeepSeek-R1 model, we uncover that its specialized attention patterns offer unique insights into the fragility of reasoning chains. Guided by these findings -- specifically the necessity of preserving few-shot examples as indivisible Semantic Units -- we propose ShotKV. This approach explicitly separates prefill and decoding phases to prioritize semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.