ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing

Yongqi An; Chang Lu; Kuan Zhu; Tao Yu; Chaoyang Zhao; Hong Wu; Ming Tang; Jinqiao Wang

arXiv:2605.08840·cs.CL·May 12, 2026

ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing

Yongqi An, Chang Lu, Kuan Zhu, Tao Yu, Chaoyang Zhao, Hong Wu, Ming Tang, Jinqiao Wang

PDF

1 Repo 1 Video

TL;DR

ReST-KV introduces a novel KV cache eviction method for large language models that improves long-context performance and reduces latency by modeling output effects and smoothing temporal and spatial variations.

Contribution

It formulates KV eviction as an output discrepancy minimization problem using layer-wise reconstruction and incorporates spatial-temporal smoothing for robustness.

Findings

01

Outperforms state-of-the-art on LongBench and RULER benchmarks.

02

Achieves 10.61× reduction in decoding latency at 128k context length.

03

Consistently outperforms existing methods on multiple long-context benchmarks.

Abstract

Large language models (LLMs) face growing challenges in efficient generative inference due to the increasing memory demands of Key-Value (KV) caches, especially for long sequences. Existing eviction methods typically retain KV pairs with high attention weights but overlook the impact of attention redistribution caused by token removal, as well as the spatial-temporal dynamics in KV selection. In this paper, we propose ReST-KV, a robust KV eviction method that combines layer-wise output Reconstruction and Spatial-Temporal smoothing to provide a more comprehensive perspective for the KV cache eviction task. Specifically, ReST-KV formulates KV cache eviction as an optimization problem that minimizes output discrepancies through efficient layer-wise reconstruction. By directly modeling how each token's removal affects the model output, our method naturally captures attention redistribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

an-yongqi/rest-kv
github

Videos

ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing· slideslive