Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory

Daniel Goldstein; Eugene Cheah

arXiv:2605.09877·cs.LG·May 18, 2026

Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory

Daniel Goldstein, Eugene Cheah

PDF

2 Repos

TL;DR

This paper introduces Key-Value Means (KVM), a novel attention mechanism for transformers that supports expandable memory, efficient long-context processing, and can be implemented without custom kernels, combining benefits of transformers and RNNs.

Contribution

The authors propose KVM, a new block-recurrence attention method enabling expandable context memory with efficient training and inference, and demonstrate its effectiveness and implementability.

Findings

01

KVM achieves competitive long-context performance with subquadratic prefill time.

02

KVM supports chunk-wise parallelizable training and prefill operations.

03

KVM can be integrated into layers to reduce memory and improve long-context decoding.

Abstract

We present Key-Value Means ("KVM"), a novel block-recurrence for attention that can accommodate either fixed-size or growing state. Equipping a strong transformer baseline with fixed-size KVM attention layers yields a strong $O (N)$ chunked RNN, while adding only an insignificant number of new parameters. We train a transformer with a growable KVM cache and show it performs competitively on long-context tests with only subquadratic prefill time and sublinear state growth. KVM is implementable with standard operations and without custom kernels, and supports chunk-wise parallelizable training and prefill. It provides many of the benefits of both traditional transformers (expandable context memory, chunk-wise parallelizable training and prefill) and linear RNNs in a single unified package. It can be used on every layer, saving KV-cache memory, and allowing a continuous range of choices of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.