More Than a Quick Glance: Overcoming the Greedy Bias in KV-Cache Compression
Aryan Sood, Tanvi Sharma, Vansh Agrawal

TL;DR
This paper introduces LASER-KV, a novel KV-cache compression framework for LLMs that overcomes greedy bias and maintains high performance under strict memory constraints, outperforming previous methods on long context tasks.
Contribution
LASER-KV employs a block-wise accumulation strategy with exact-LSH recall, enabling more effective KV compression without significant semantic recall loss.
Findings
Previous compression methods degrade performance by 15-30%.
LASER-KV maintains stable performance and improves accuracy by up to 10% at 128k context length.
The study challenges the assumption that attention scores are sufficient proxies for token utility.
Abstract
While Large Language Models (LLMs) can theoretically support extensive context windows, their actual deployment is constrained by the linear growth of Key-Value (KV) cache memory. Prevailing compression strategies mitigate this through various pruning mechanisms, yet trade-off semantic recall for memory efficiency. In this work, we present LASER-KV (Layer Accumulated Selection with Exact-LSH Recall), a framework designed to test the limits of KV compression under a strict accumulative budgeting policy. We deviate from the standard fixed summary size approach by implementing a block-wise accumulation strategy governed by a protection divisor (n). This allows us to isolate the effects of compression from sliding window artifacts. Our experiments on the Babilong benchmark reveal performance degradation in previous compression methods by 15-30% on various long context tasks. LASER-KV…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Big Data and Digital Economy · Advanced Data Storage Technologies
