Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction
Gabriel Garcia

TL;DR
This paper demonstrates that structural protection at cache boundaries significantly improves KV cache eviction performance in transformer models, surpassing scoring-based policies.
Contribution
It introduces the importance of structural protection in KV cache eviction, showing it dominates scoring strategies and offers substantial quality recovery across multiple models.
Findings
Protection recovers 69-98% of reference quality on LongBench models.
Structural protection with boundary guarding outperforms scoring policies at small cache sizes.
Faithful per-head scoring provides modest additional gains when combined with protection.
Abstract
We study KV cache eviction under a shared globally capped decode-time harness. Seven policies (LRU, H2O, SnapKV, StreamingLLM, Ada-KV, QUEST, Random) share a prompt-boundary vulnerability: without structural protection, they collapse to near-zero quality on six pure-transformer models (F10.064). Reserving 10\% of cache at each boundary recovers 69--90\% of the reference-ceiling quality on seven LongBench models at (13\% retention); a ten-model panel spans 68--98\%. An attention-mass pilot (Qwen2.5-3B, ) suggests why: the position-0 sink holds of prefix mass, while other boundary tokens sit near uniform expectation, so attention scorers retain the sink but still drop structurally critical tokens. With protection, simplified score-isolation variants are TOST-equivalent to LRU at (); at ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
