Reformulating KV Cache Eviction Problem for Long-Context LLM Inference

Tho Mai; Joo-Young Kim

arXiv:2605.07234·cs.CL·May 11, 2026

Reformulating KV Cache Eviction Problem for Long-Context LLM Inference

Tho Mai, Joo-Young Kim

PDF

TL;DR

This paper introduces LaProx, a new method for KV cache eviction in long-context LLM inference that models token importance more accurately, enabling significant cache reduction with minimal performance loss.

Contribution

It reformulates KV cache eviction as an output-aware, layer-wise matrix approximation problem and proposes a unified, globally comparable importance scoring strategy.

Findings

01

Maintains model performance with only 5% KV cache usage.

02

Outperforms prior methods across 19 datasets on LongBench and Needle-In-A-Haystack.

03

Reduces accuracy loss by up to 2× under extreme cache compression.

Abstract

Large language models (LLMs) support long-context inference but suffer from substantial memory and runtime overhead due to Key-Value (KV) Cache growth. Existing KV Cache eviction methods primarily rely on local attention weights, neglecting the influence of value representations, output projection, and inter-head interactions. In this work, we reformulate KV Cache eviction from a conventional head-wise, weight-averaging approach into an output-aware, layer-wise matrix multiplication approximation problem. We introduce LaProx, a novel eviction strategy that explicitly models the multiplicative interaction between attention maps and projected value states to accurately quantify token contributions while accounting for inter-head dependencies. Building on this metric, we propose the first unified eviction strategy that assigns globally comparable importance scores to tokens, enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.