On the Limits of Learned Importance Scoring for KV Cache Compression

Brady Steele

arXiv:2601.14279·cs.LG·January 22, 2026

On the Limits of Learned Importance Scoring for KV Cache Compression

Brady Steele

PDF

Open Access

TL;DR

This paper evaluates learned importance scoring for KV cache compression in language models, finding simple heuristics often outperform complex learned methods due to limited information in representations and inherent circular dependencies.

Contribution

It introduces SIP, a learned scorer for importance prediction, and demonstrates its limitations compared to simple heuristics and prefill attention across multiple tasks.

Findings

01

Position-based heuristics match or outperform learned scorers.

02

Prefill attention provides signals comparable to complex models.

03

Limited information in KV representations constrains importance prediction.

Abstract

We investigate learned KV cache compression through Speculative Importance Prediction (SIP), a 1.7M parameter non-query-aware scorer that predicts token importance from KV representations alone. Despite architectural sophistication (multi-horizon lookahead, cross-attention), SIP does not outperform simple baselines, including random selection, across 5 seeds, 4 retention levels, and 3 tasks. Key findings: (1) position-based heuristics (keep first 4 + last N tokens) match or exceed learned approaches; (2) prefill attention provides equivalent signal to complex learned scorers; (3) marginal information in KV representations beyond position and prefill attention appears limited for importance prediction. We hypothesize that circular dependence between future queries and generation trajectories contributes to this difficulty.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Advanced Data Storage Technologies · Parallel Computing and Optimization Techniques