Fundamental limits of learning in sequence multi-index models and deep attention networks: High-dimensional asymptotics and sharp thresholds

Emanuele Troiani; Hugo Cui; Yatin Dandi; Florent Krzakala; Lenka Zdeborov\'a

arXiv:2502.00901·cs.LG·November 13, 2025

Fundamental limits of learning in sequence multi-index models and deep attention networks: High-dimensional asymptotics and sharp thresholds

Emanuele Troiani, Hugo Cui, Yatin Dandi, Florent Krzakala, Lenka Zdeborov\'a

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper analyzes the fundamental limits of learning deep attention networks by mapping them to sequence multi-index models, deriving sharp asymptotic performance thresholds, and revealing how layers are learned sequentially in high dimensions.

Contribution

It introduces a novel mapping of deep attention networks to sequence multi-index models and provides sharp asymptotic performance thresholds in high-dimensional settings.

Findings

01

Sharp thresholds for sample complexity in high dimensions

02

Sequential layer learning dynamics uncovered

03

Asymptotic optimal performance characterized

Abstract

In this manuscript, we study the learning of deep attention neural networks, defined as the composition of multiple self-attention layers, with tied and low-rank weights. We first establish a mapping of such models to sequence multi-index models, a generalization of the widely studied multi-index model to sequential covariates, for which we establish a number of general results. In the context of Bayesian-optimal learning, in the limit of large dimension $D$ and commensurably large number of samples $N$ , we derive a sharp asymptotic characterization of the optimal performance as well as the performance of the best-known polynomial-time algorithm for this setting --namely approximate message-passing--, and characterize sharp thresholds on the minimal sample complexity required for better-than-random prediction performance. Our analysis uncovers, in particular, how the different layers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spoc-group/sequenceindexmodels
noneOfficial

Videos

Fundamental limits of learning in sequence multi-index models and deep attention networks: high-dimensional asymptotics and sharp thresholds· slideslive

Taxonomy

TopicsNeural Networks and Applications

MethodsSoftmax · Attention Is All You Need