Residual-Mass Accounting for Partial-KV Decoding
Yasuto Hoshi, Daisuke Miyashita, Jun Deguchi

TL;DR
This paper introduces a residual-mass accounting method for partial-KV decoding in language models, improving retrieval accuracy with minimal support budgets across multiple benchmarks.
Contribution
It proposes a novel residual-mass accounting rule that enhances partial-KV decoding performance without altering the core language model architecture.
Findings
Improves over Top-K baseline at 1% exact-support budget on RULER and BABILong.
Maintains favorable summarization results on LongBench.
Residual subtraction reduces main error source to learned feature approximation.
Abstract
We study a controlled partial-KV decoding setting in which exact unnormalized softmax contributions are computed for sink/tail anchors and a retrieved token set, while the remaining prefill tokens are represented by a residual estimate. We focus on the accounting rule after the query-dependent exact support has been selected, and use exhaustive Top-K only as an oracle selector, not as a deployable retrieval system. The proposed rule leaves the backbone language model and the exact-branch KV tensors unchanged. It builds fixed-size summary states from learned positive feature maps , subtracts retrieved-token feature contributions to keep the exact and residual sets non-overlapping, and merges the estimated residual numerator and denominator with the exact branch under one normalization. At a 1% exact-support budget, our residual-completion method improves over the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
