Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention

Yuzhen Mao; Michael Y. Li; Emily B. Fox

arXiv:2604.20920·cs.LG·April 24, 2026

Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention

Yuzhen Mao, Michael Y. Li, Emily B. Fox

PDF

1 Repo

TL;DR

This paper introduces a learnable, coarse-to-fine sparse attention mechanism using gist compression tokens that improves long-context processing in language models, outperforming existing methods.

Contribution

It proposes a novel end-to-end trainable framework combining gist compression and selective unfolding for efficient long-context attention.

Findings

01

Outperforms other compression baselines on LongBench and RAG benchmarks.

02

Achieves logarithmic complexity in multi-resolution context access.

03

Effective at compression ratios from 8x to 32x.

Abstract

Scaling large language models to long contexts is challenging due to the quadratic computational cost of full attention. Mitigation approaches include KV-cache selection or compression techniques. We instead provide an effective and end-to-end learnable bridge between the two without requiring architecture modification. In particular, our key insight is that interleaved gist compression tokens -- which provide a learnable summary of sets of raw tokens -- can serve as routing signals for sparse attention. Building on this, we introduce selective unfolding via GSA, which first compresses the context into gist tokens, then selects the most relevant gists, and subsequently restores the corresponding raw chunks for detailed attention. This yields a simple coarse-to-fine mechanism that combines compact global representations with targeted access to fine-grained evidence. We further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuzhenmao/gist-sparse-attention
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.