Momentum Point-Perplexity Mechanics in Large Language Models
Lorenzo Tomaz, Judd Rosenblatt, Thomas Berry Jones, Diogo Schwerz de Lucena

TL;DR
This paper introduces a physics-inspired framework to analyze and control large language models' internal states, leading to improved interpretability, alignment, and output quality through a novel Jacobian steering method.
Contribution
It proposes a new physics-based perspective on transformer dynamics and develops Jacobian steering for targeted, minimal perturbations to guide model outputs.
Findings
Energy-like quantity remains nearly constant during inference
Random-weight models conserve this energy more tightly
Jacobian steering improves output quality and interpretability
Abstract
We take a physics-based approach to studying how the internal hidden states of large language models change from token to token during inference. Across 20 open-source transformer models (135M-3B parameters), we find that a quantity combining the rate of change in hidden states and the model's next-token certainty, analogous to energy in physics, remains nearly constant. Random-weight models conserve this "energy" more tightly than pre-trained ones, while training shifts models into a faster, more decisive regime with greater variability. Using this "log-Lagrangian" view, we derive a control method called Jacobian steering, which perturbs hidden states in the minimal way needed to favor a target token. This approach maintained near-constant energy in two tested models and produced continuations rated higher in semantic quality than the models' natural outputs. Viewing transformers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Generative Adversarial Networks and Image Synthesis
