Bayes optimal learning of attention-indexed models

Fabrizio Boncoraglio; Emanuele Troiani; Vittorio Erba; Lenka Zdeborov\'a

arXiv:2506.01582·cs.LG·February 3, 2026

Bayes optimal learning of attention-indexed models

Fabrizio Boncoraglio, Emanuele Troiani, Vittorio Erba, Lenka Zdeborov\'a

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces the attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers of transformers, providing insights into generalization, phase transitions, and optimal algorithms.

Contribution

AIM is a new, analytically tractable model that closely resembles practical transformers and allows for precise predictions of learning behavior in attention mechanisms.

Findings

01

Closed-form predictions for Bayes-optimal generalization error

02

Identification of phase transitions in learning dynamics

03

Validation of algorithms reaching optimal performance

Abstract

We introduce the attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers. Inspired by multi-index models, AIM captures how token-level outputs emerge from layered bilinear interactions over high-dimensional embeddings. Unlike prior tractable attention models, AIM allows full-width key and query matrices, aligning more closely with practical transformers. Using tools from statistical mechanics and random matrix theory, we derive closed-form predictions for Bayes-optimal generalization error and identify sharp phase transitions as a function of sample complexity, model width, and sequence length. We propose a matching approximate message passing algorithm and show that gradient descent can reach optimal performance. AIM offers a solvable playground for understanding learning in self-attention layers, that are key components of modern…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SPOC-group/ExtensiveRankAttention
pytorchOfficial

Videos

Bayes optimal learning of attention-indexed models· slideslive

Taxonomy

TopicsNeural Networks and Applications

MethodsSoftmax · Attention Is All You Need