RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction

Sihao Liu; YuFan Xiong; Zhonghua Jiang; Zhaode Wang; chengfei lv Shengyu Zhang

arXiv:2605.04075·cs.LG·May 7, 2026

RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction

Sihao Liu, YuFan Xiong, Zhonghua Jiang, Zhaode Wang, chengfei lv Shengyu Zhang

PDF

TL;DR

RetentiveKV introduces a state-space memory approach for more effective, uncertainty-aware cache eviction in multimodal large language models, significantly improving efficiency and preserving important visual tokens.

Contribution

It proposes an entropy-driven, continuous memory evolution method for KV cache eviction, addressing limitations of existing pruning techniques in multimodal settings.

Findings

01

Achieves 5.0× KV cache compression

02

Provides 1.5× decoding acceleration

03

Effectively preserves important visual tokens during decoding

Abstract

Multimodal Large Language Models face severe challenges in computational efficiency and memory consumption due to the substantial expansion of the visual KV cache when processing long visual contexts. Existing KV cache compression methods typically rely on the "persistence of importance" hypothesis to prune tokens. However, this approach proves fragile in multimodal settings due to two key issues: 1) Visual tokens display "deferred importance," initially exhibiting low salience but becoming pivotal during later decoding, which can lead to premature eviction. 2) Discrete pruning disrupts the inherent spatial continuity of visual cues. To address these challenges, we propose RetentiveKV, an entropy-driven KV cache optimization method that reformulates KV eviction from "discrete context truncation" to "continuous memory evolution" based on State Space Models. Our method leverages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.