Minimal-Intervention KV Retention via Set-Conditioned Diversity

Libo Sun; Po-wei Harn; Peixiong He; Xiao Qin

arXiv:2605.14292·cs.LG·May 19, 2026

Minimal-Intervention KV Retention via Set-Conditioned Diversity

Libo Sun, Po-wei Harn, Peixiong He, Xiao Qin

PDF

TL;DR

This paper introduces a simple yet effective modification to KV-cache retention scoring called alpha, which outperforms more complex redesigns in small-budget mathematical reasoning tasks.

Contribution

The paper proposes alpha, a one-function change to the TriAttention scorer, demonstrating its effectiveness over structural redesigns in KV-cache compression for reasoning models.

Findings

01

Alpha outperforms heavier redesigns at small budgets.

02

Alpha clears Bonferroni correction in two model/budget settings.

03

Minimal scoring modification beats structural redesigns in this regime.

Abstract

KV-cache compression at small budgets is a crowded design space spanning cache representation, head-wise routing, compression cadence, decoding behavior, and within-budget scoring. We study seven mechanisms across these five families under matched mean cache on long-form mathematical reasoning (MATH-500~\cite{hendrycks2021math}) with two distilled-reasoning models (Qwen-7B and Llama-8B variants of DeepSeek-R1-Distill~\cite{deepseek2025r1}) at budgets $b \in {64, 128}$ . All seven were rejected. We then propose $α$ , a one-function modification to the TriAttention~\cite{mao2026triattention} retention scorer that replaces argmax-top- $k$ with greedy facility-location-inspired selection under a V-space redundancy penalty controlled by a single weight $λ$ . A pre-registered protocol tunes $λ$ on a frozen development split and confirms on a disjoint held-out split; with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.