Multistability of Self-Attention Dynamics in Transformers
Claudio Altafini

TL;DR
This paper models self-attention in transformers as a continuous-time multiagent dynamical system related to the Oja flow, classifying its equilibria and analyzing their stability and alignment with eigenvectors.
Contribution
It introduces a novel dynamical systems perspective on self-attention, classifies equilibrium types, and explores their stability and eigenvector alignment.
Findings
Multiple stable equilibria coexist in self-attention dynamics.
Equilibria often align with eigenvectors of the value matrix.
The model relates to the multiagent Oja flow, providing new insights into attention mechanisms.
Abstract
In machine learning, a self-attention dynamics is a continuous-time multiagent-like model of the attention mechanisms of transformers. In this paper we show that such dynamics is related to a multiagent version of the Oja flow, a dynamical system that computes the principal eigenvector of a matrix corresponding for transformers to the value matrix. We classify the equilibria of the ``single-head'' self-attention system into four classes: consensus, bipartite consensus, clustering and polygonal equilibria. Multiple asymptotically stable equilibria from the first three classes often coexist in the self-attention dynamics. Interestingly, equilibria from the first two classes are always aligned with the eigenvectors of the value matrix, often but not exclusively with the principal eigenvector.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Neural dynamics and brain function · Advanced Memory and Neural Computing
