Transformers converge to invariant algorithmic cores
Joshua S. Schiffman

TL;DR
This paper uncovers low-dimensional invariant structures in transformer models that are essential for their computation, revealing shared algorithmic cores across different training runs and scales, which could enhance interpretability.
Contribution
It identifies and characterizes invariant algorithmic cores in transformers that are consistent across training runs and scales, advancing understanding of their internal computations.
Findings
Transformers learn different weights but converge to the same core structures.
Markov-chain transformers embed cores in nearly orthogonal subspaces with identical spectra.
GPT-2's subject-verb agreement is governed by a single, manipulable axis.
Abstract
Large language models exhibit sophisticated capabilities, yet understanding how they work internally remains a central challenge. A fundamental obstacle is that training selects for behavior, not circuitry, so many weight configurations can implement the same function. Which internal structures reflect the computation, and which are accidents of a particular training run? This work extracts algorithmic cores: compact subspaces necessary and sufficient for task performance. Independently trained transformers learn different weights but converge to the same cores. Markov-chain transformers embed 3D cores in nearly orthogonal subspaces yet recover identical transition spectra. Modular-addition transformers discover compact cyclic operators at grokking that later inflate, yielding a predictive model of the memorization-to-generalization transition. GPT-2 language models govern subject-verb…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Machine Learning and Algorithms
