Nearly Optimal Attention Coresets

Edo Liberty; Alexandr Andoni; Eldar Kleiner

arXiv:2605.05602·cs.DS·May 8, 2026

Nearly Optimal Attention Coresets

Edo Liberty, Alexandr Andoni, Eldar Kleiner

PDF

TL;DR

This paper proves the existence of nearly optimal size coresets for approximating the Attention mechanism in neural networks, improving previous bounds and establishing lower bounds.

Contribution

It introduces nearly optimal size coresets for Attention estimation, advancing the theoretical understanding of space-efficient neural network approximations.

Findings

01

Existence of coresets of size $O({\sqrt{d} e^{ ho+o( ho)}/\varepsilon})$ for Attention

02

Coresets approximate Attention within $\varepsilon$ for all bounded queries

03

Lower bounds show coresets must have size at least $\Omega({\sqrt{d} e^{ ho}/\varepsilon})$

Abstract

We consider the problem of estimating the Attention mechanism in small space, and prove the existence of coresets for it of nearly optimal size. Specifically, we show that for any set of unit-norm keys and values $(K, V)$ in $R^{d}$ , there exists a subset $(K^{'}, V^{'})$ of size at most $O (d e^{ρ + o (ρ)} / ε)$ such that \[ \left\| \operatorname{Attn}(q,K,V)- \operatorname{Attn}(q,K',V') \right\| \le \varepsilon \] simultaneously for all queries whose norm is bounded by $ρ$ . This outperforms the best known results for this problem. We also offer an improved lower bound showing that $ε$ -coresets must have size $Ω (d e^{ρ} / ϵ)$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.