Trellis: Learning to Compress Key-Value Memory in Attention Models

Mahdi Karami; Ali Behrouz; Praneeth Kacham; Vahab Mirrokni

arXiv:2512.23852·cs.LG·January 1, 2026

Trellis: Learning to Compress Key-Value Memory in Attention Models

Mahdi Karami, Ali Behrouz, Praneeth Kacham, Vahab Mirrokni

PDF

Open Access

TL;DR

Trellis is a new Transformer architecture that dynamically compresses its key-value memory at test time, reducing memory usage and improving performance on long sequences across various tasks.

Contribution

It introduces a learnable, recursive compression mechanism with bounded memory for Transformers, enabling efficient handling of long sequences and dynamic memory updates.

Findings

01

Outperforms strong baselines on language modeling and reasoning tasks.

02

Performance improves with longer sequences, showing scalability.

03

Efficiently updates memory at test time using online gradient descent.

Abstract

Transformers, while powerful, suffer from quadratic computational complexity and the ever-growing Key-Value (KV) cache of the attention mechanism. This paper introduces Trellis, a novel Transformer architecture with bounded memory that learns how to compress its key-value memory dynamically at test time. Trellis replaces the standard KV cache with a fixed-size memory and train a two-pass recurrent compression mechanism to store new keys and values into memory. To achieve this, it leverages an online gradient descent procedure with a forget gate, enabling the compressed memory to be updated recursively while learning to retain important contextual information from incoming tokens at test time. Extensive experiments on language modeling, common-sense reasoning, recall-intensive tasks, and time series show that the proposed architecture outperforms strong baselines. Notably, its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Machine Learning in Healthcare · Topic Modeling