Dynamic metastability in the self-attention model
Borjan Geshkovski, Hugo Koubbi, Yury Polyanskiy, Philippe Rigollet

TL;DR
This paper analyzes a self-attention particle system modeling Transformers, proving metastability phenomena where particles stay near multi-cluster configurations for exponentially long times before collapsing, linking to gradient flow frameworks.
Contribution
It establishes the existence of dynamic metastability in the self-attention model and connects it to gradient flow theory, providing insights into long-term behavior of neural network dynamics.
Findings
Particles remain near multi-cluster states for exponentially long times
Energy landscape exhibits staircase profile with saddle-to-saddle trajectories
Finite-time energy maximum indicates a phase transition in dynamics
Abstract
We consider the self-attention model - an interacting particle system on the unit sphere, which serves as a toy model for Transformers, the deep neural network architecture behind the recent successes of large language models. We prove the appearance of dynamic metastability conjectured in [GLPR23] - although particles collapse to a single cluster in infinite time, they remain trapped near a configuration of several clusters for an exponentially long period of time. By leveraging a gradient flow interpretation of the system, we also connect our result to an overarching framework of slow motion of gradient flows proposed by Otto and Reznikoff [OR07] in the context of coarsening and the Allen-Cahn equation. We finally probe the dynamics beyond the exponentially long period of metastability, and illustrate that, under an appropriate time-rescaling, the energy reaches its global maximum in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlock Copolymer Self-Assembly · Theoretical and Computational Physics · Quantum many-body systems
