Thermodynamic Isomorphism of Transformers: A Lagrangian Approach to Attention Dynamics
Gunn Kim

TL;DR
This paper introduces a thermodynamic framework for analyzing Transformer attention, revealing that Softmax arises as a free energy minimizer and identifying critical-like fluctuations associated with generalization in neural networks.
Contribution
It develops a Lagrangian-based field-theoretic approach to interpret attention dynamics as a thermodynamic system, linking statistical mechanics with neural network behavior.
Findings
Softmax function as a free energy minimizer in the thermodynamic model
Observation of a fluctuation peak preceding generalization in experiments
Identification of a crossover behavior rather than a phase transition in attention dynamics
Abstract
We propose an effective field-theoretic framework for analyzing Transformer attention through a thermodynamic lens. By constructing a Lagrangian on the information manifold equipped with the Fisher metric, we show that, within the Shannon--Boltzmann entropy framework, the Softmax function arises as a stationary solution minimizing a Helmholtz free energy functional. This establishes a formal correspondence between scaled dot-product attention and canonical ensemble statistics. Extending this mapping to macroscopic observables, we define an effective specific heat associated with fluctuations of the attention energy landscape. In controlled experiments on the modular addition task (--), we observe a robust peak in this fluctuation measure that consistently precedes the onset of generalization. While no asymptotic power-law divergence is detected in this finite-depth regime,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum many-body systems · Statistical Mechanics and Entropy · Advanced Thermodynamics and Statistical Mechanics
