Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction

Gabriel Garcia

arXiv:2605.18053·cs.LG·May 19, 2026

Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction

Gabriel Garcia

PDF

TL;DR

This paper demonstrates that structural protection at cache boundaries significantly improves KV cache eviction performance in transformer models, surpassing scoring-based policies.

Contribution

It introduces the importance of structural protection in KV cache eviction, showing it dominates scoring strategies and offers substantial quality recovery across multiple models.

Findings

01

Protection recovers 69-98% of reference quality on LongBench models.

02

Structural protection with boundary guarding outperforms scoring policies at small cache sizes.

03

Faithful per-head scoring provides modest additional gains when combined with protection.

Abstract

We study KV cache eviction under a shared globally capped decode-time harness. Seven policies (LRU, H2O, SnapKV, StreamingLLM, Ada-KV, QUEST, Random) share a prompt-boundary vulnerability: without structural protection, they collapse to near-zero quality on six pure-transformer models (F1 $\leq$ 0.064). Reserving 10\% of cache at each boundary recovers 69--90\% of the $C = 2, 048$ reference-ceiling quality on seven LongBench models at $C = 256$ (13\% retention); a ten-model panel spans 68--98\%. An attention-mass pilot (Qwen2.5-3B, $N = 30$ ) suggests why: the position-0 sink holds $\sim 75%$ of prefix mass, while other boundary tokens sit near $\sim 0.41 \times$ uniform expectation, so attention scorers retain the sink but still drop structurally critical tokens. With protection, simplified score-isolation variants are TOST-equivalent to LRU at $K = 32$ ( $Δ = 0.02$ ); at $K = 8$ ,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.