TL;DR
This paper offers an energy-agnostic analysis of self-attention dynamics using Jacobians, revealing how normalization influences oscillations and criticality, and proposing new regularization and monitoring methods.
Contribution
It relaxes traditional energy-based assumptions in self-attention analysis and introduces Jacobian-based tools for understanding and improving inference dynamics.
Findings
Normalization suppresses Jacobian eigenvalues and oscillations.
Normalized dynamics are near a critical state linked to high performance.
Jacobian analysis enables new regularization and monitoring techniques.
Abstract
The theoretical understanding of self-attention (SA) has been steadily progressing. A prominent line of work studies a class of SA layers that admit an energy function decreased by state updates. While it provides valuable insights into inherent biases in signal propagation, it often relies on idealized assumptions or additional constraints not necessarily present in standard SA. Thus, to broaden our understanding, this work aims to relax these energy constraints and provide an energy-agnostic characterization of inference dynamics by dynamical systems analysis. In more detail, we first consider relaxing the symmetry and single-head constraints traditionally required in energy-based formulations. Next, we show that analyzing the Jacobian matrix of the state is highly valuable when investigating more general SA architectures without necessarily admitting an energy function. It reveals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
