Momentum Attention: The Physics of In-Context Learning and Spectral Forensics for Mechanistic Interpretability
Kingsuk Maitra

TL;DR
This paper introduces Momentum Attention, a physics-inspired augmentation for Transformers that enables single-layer induction and spectral analysis, bridging mechanistic interpretability with physical principles and signal processing.
Contribution
It presents a symplectic augmentation called Momentum Attention, establishing a duality with high-pass filters, allowing direct velocity access and spectral forensics in Transformer models.
Findings
Momentum Attention enables single-layer induction.
The model surpasses expectations on induction-heavy tasks.
A scaling law for momentum-depth fungibility is established.
Abstract
The Mechanistic Interpretability (MI) program has mapped the Transformer as a precise computational graph. We extend this graph with a conservation law and time-varying AC dynamics, viewing it as a physical circuit. We introduce Momentum Attention, a symplectic augmentation embedding physical priors via the kinematic difference operator , implementing the symplectic shear on queries and keys. We identify a fundamental Symplectic-Filter Duality: the physical shear is mathematically equivalent to a High-Pass Filter. This duality is our cornerstone contribution -- by injecting kinematic momentum, we sidestep the topological depth constraint () for induction head formation. While standard architectures require two layers for induction from static positions, our extension grants direct access to velocity, enabling Single-Layer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks · Generative Adversarial Networks and Image Synthesis
