Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition
Zhongtian Chen, Edmund Lau, Jake Mendel, Susan Wei, Daniel Murfet

TL;DR
This paper explores phase transitions in a simplified model of superposition using Singular Learning Theory, revealing geometric critical points that influence Bayesian and SGD learning behaviors and suggesting a sequential learning mechanism.
Contribution
It introduces a closed-form formula for the theoretical loss in the Toy Model of Superposition and identifies geometric critical points affecting phase transitions in Bayesian and SGD training.
Findings
Regular $k$-gons are critical points in the model.
The local learning coefficient determines phase transitions.
SGD and Bayesian learning follow a sequential journey through parameter space.
Abstract
We investigate phase transitions in a Toy Model of Superposition (TMS) using Singular Learning Theory (SLT). We derive a closed formula for the theoretical loss and, in the case of two hidden dimensions, discover that regular -gons are critical points. We present supporting theory indicating that the local learning coefficient (a geometric invariant) of these -gons determines phase transitions in the Bayesian posterior as a function of training sample size. We then show empirically that the same -gon critical points also determine the behavior of SGD training. The picture that emerges adds evidence to the conjecture that the SGD learning trajectory is subject to a sequential learning mechanism. Specifically, we find that the learning process in TMS, be it through SGD or Bayesian learning, can be characterized by a journey through parameter space from regions of high loss and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Algorithms · Bayesian Methods and Mixture Models
MethodsStochastic Gradient Descent
