Infinite Limits of Multi-head Transformer Dynamics

Blake Bordelon; Hamza Tahir Chaudhry; Cengiz Pehlevan

arXiv:2405.15712·stat.ML·October 7, 2024·3 cites

Infinite Limits of Multi-head Transformer Dynamics

Blake Bordelon, Hamza Tahir Chaudhry, Cengiz Pehlevan

PDF

Open Access

TL;DR

This paper investigates the training dynamics of transformer models in the feature learning regime, analyzing various infinite-width and depth limits using dynamical mean field theory to understand how parameterization affects learned features.

Contribution

It identifies parameterizations that allow well-defined infinite limits and analyzes different infinite regimes of transformers, providing a theoretical framework for understanding their training dynamics.

Findings

01

Different infinite limits have distinct statistical descriptions.

02

Parameterization influences the features learned by transformers.

03

Numerical evidence supports convergence to the theoretical limits.

Abstract

In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime. We identify the set of parameterizations that admit well-defined infinite width and depth limits, allowing the attention layers to update throughout training--a relevant notion of feature learning in these models. We then use tools from dynamical mean field theory (DMFT) to analyze various infinite limits (infinite key/query dimension, infinite heads, and infinite depth) which have different statistical descriptions depending on which infinite limit is taken and how attention layers are scaled. We provide numerical evidence of convergence to the limits and discuss how the parameterization qualitatively influences learned features.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsControl and Stability of Dynamical Systems · Control Systems in Engineering · Physics and Engineering Research Articles

MethodsSparse Evolutionary Training