Multistability of Self-Attention Dynamics in Transformers

Claudio Altafini

arXiv:2511.11553·cs.LG·November 17, 2025

Multistability of Self-Attention Dynamics in Transformers

Claudio Altafini

PDF

Open Access

TL;DR

This paper models self-attention in transformers as a continuous-time multiagent dynamical system related to the Oja flow, classifying its equilibria and analyzing their stability and alignment with eigenvectors.

Contribution

It introduces a novel dynamical systems perspective on self-attention, classifies equilibrium types, and explores their stability and eigenvector alignment.

Findings

01

Multiple stable equilibria coexist in self-attention dynamics.

02

Equilibria often align with eigenvectors of the value matrix.

03

The model relates to the multiagent Oja flow, providing new insights into attention mechanisms.

Abstract

In machine learning, a self-attention dynamics is a continuous-time multiagent-like model of the attention mechanisms of transformers. In this paper we show that such dynamics is related to a multiagent version of the Oja flow, a dynamical system that computes the principal eigenvector of a matrix corresponding for transformers to the value matrix. We classify the equilibria of the ``single-head'' self-attention system into four classes: consensus, bipartite consensus, clustering and polygonal equilibria. Multiple asymptotically stable equilibria from the first three classes often coexist in the self-attention dynamics. Interestingly, equilibria from the first two classes are always aligned with the eigenvectors of the value matrix, often but not exclusively with the principal eigenvector.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Neural dynamics and brain function · Advanced Memory and Neural Computing