Infinite-Width Limit of a Single Attention Layer: Analysis via Tensor Programs
Mana Sakai, Ryo Karakida, Masaaki Imaizumi

TL;DR
This paper rigorously derives the non-Gaussian infinite-width limit distribution of a single attention layer in neural networks using Tensor Programs, providing insights beyond traditional Gaussian approximations and aiding the understanding of Transformer models.
Contribution
It introduces a novel theoretical framework to analyze the infinite-width limit of attention layers without simplifying assumptions like infinite heads or special scalings.
Findings
The limit distribution is non-Gaussian and hierarchical.
Numerical experiments confirm the theory's accuracy at finite widths.
The results enable a unified understanding of deep Transformer architectures.
Abstract
In modern theoretical analyses of neural networks, the infinite-width limit is often invoked to justify Gaussian approximations of neuron preactivations (e.g., via neural network Gaussian processes or Tensor Programs). However, these Gaussian-based asymptotic theories have so far been unable to capture the behavior of attention layers, except under special regimes such as infinitely many heads or tailored scaling schemes. In this paper, leveraging the Tensor Programs framework, we rigorously identify the infinite-width limit distribution of variables within a single attention layer under realistic architectural dimensionality and standard -scaling with dimensionality. We derive the exact form of this limit law without resorting to infinite-head approximations or tailored scalings, demonstrating that it departs fundamentally from Gaussianity. This limiting distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTensor decomposition and applications · Parallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques
