Using physics-inspired Singular Learning Theory to understand grokking & other phase transitions in modern neural networks
Anish Lakkapragada

TL;DR
This paper applies singular learning theory, a physics-inspired framework, to empirically analyze phase transitions and grokking phenomena in neural networks, revealing insights into their non-identifiable nature and scaling behaviors.
Contribution
It demonstrates the practical utility of singular learning theory in understanding neural network phase transitions and scaling laws through empirical experiments on toy models.
Findings
SLT free energy aligns with Arrhenius-style rate hypothesis.
Local learning coefficient scales with problem difficulty.
Some observed scaling laws match theoretical predictions, others deviate.
Abstract
Classical statistical inference and learning theory often fail to explain the success of modern neural networks. A key reason is that these models are non-identifiable (singular), violating core assumptions behind PAC bounds and asymptotic normality. Singular learning theory (SLT), a physics-inspired framework grounded in algebraic geometry, has gained popularity for its ability to close this theory-practice gap. In this paper, we empirically study SLT in toy settings relevant to interpretability and phase transitions. First, we understand the SLT free energy by testing an Arrhenius-style rate hypothesis using both a grokking modulo-arithmetic model and Anthropic's Toy Models of Superposition. Second, we understand the local learning coefficient by measuring how it scales with problem difficulty across several controlled network families (polynomial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Neural Networks and Applications
