Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer
Satadeep Bhattacharjee, Seung-Cheol Lee

TL;DR
This paper applies a physics-inspired spin-bath model to analyze GPT-2's attention mechanism, deriving Hamiltonians and phase boundaries that predict token selection, and empirically validating the model's relevance to language generation.
Contribution
It provides the first empirical validation of the spin-bath analogy in a large language model by deriving Hamiltonians and demonstrating causal effects through targeted ablations.
Findings
Strong negative correlation between theoretical logit gaps and empirical token rankings.
Ablation of spin-bath aligned heads shifts output probabilities as predicted.
Hamiltonian analysis offers a physics-grounded interpretability of attention mechanisms.
Abstract
The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians, we obtain analytic phase boundaries and logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model's empirical token rankings (, ).Targeted ablations further show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces
