Training Transformers for KV Cache Compressibility

Yoav Gelberg; Yam Eitan; Michael Bronstein; Yarin Gal; Haggai Maron

arXiv:2605.05971·cs.LG·May 13, 2026

Training Transformers for KV Cache Compressibility

Yoav Gelberg, Yam Eitan, Michael Bronstein, Yarin Gal, Haggai Maron

PDF

TL;DR

This paper introduces KV-CAT, a training method that encourages transformers to learn representations that are more compressible, improving long-context language modeling efficiency.

Contribution

It formalizes KV compressibility as a property of learned representations and proposes a training procedure to enhance this property in transformers.

Findings

01

KV-CAT improves downstream compression quality.

02

It enhances the tradeoff between compression and model performance.

03

The method benefits retrieval, long-context QA, and perplexity-based tasks.

Abstract

Long-context language modeling is increasingly constrained by the Key-Value (KV) cache, whose memory and decode-time access costs scale linearly with the prefix length. This bottleneck has motivated a range of context-compression methods, from token-level summarization to recent optimization-based KV compression methods. These post-hoc methods operate on the KV cache of a fixed pretrained model, so their effectiveness is fundamentally limited by how well the model's internal representations can be compressed. In this work, we formalize the notion of KV compressibility and show that it is a property of the learned representations, rather than of the context alone. We prove that almost any sequence-to-vector function admits both highly compressible and inherently non-compressible transformer implementations, highlighting the need to guide transformers toward compressible representations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.