Dynamical Mean-Field Theory of Self-Attention Neural Networks
\'Angel Poc-L\'opez, Miguel Aguilera

TL;DR
This paper develops an analytical framework for understanding the complex dynamics of transformer-based self-attention neural networks, revealing phase transitions and chaotic behavior, which could improve interpretability and training efficiency.
Contribution
It introduces a novel analytical approach using path integral methods to study the nonequilibrium dynamics of large self-attention networks, connecting them to Hopfield models.
Findings
Revealed nontrivial dynamical phenomena including phase transitions and chaos.
Derived analytical approximations for large self-attention networks with 1-bit tokens and weights.
Identified potential for reducing training costs and improving interpretability.
Abstract
Transformer-based models have demonstrated exceptional performance across diverse domains, becoming the state-of-the-art solution for addressing sequential machine learning problems. Even though we have a general understanding of the fundamental components in the transformer architecture, little is known about how they operate or what are their expected dynamics. Recently, there has been an increasing interest in exploring the relationship between attention mechanisms and Hopfield networks, promising to shed light on the statistical physics of transformer networks. However, to date, the dynamical regimes of transformer-like models have not been studied in depth. In this paper, we address this gap by using methods for the study of asymmetric Hopfield networks in nonequilibrium regimes --namely path integral methods over generating functionals, yielding dynamics governed by concurrent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSoftmax
