A Basin-Selection Perspective on Grokking via Singular Learning Theory
Ben Cullen, Sergio Estan-Ruiz, Riya Danait, Jiayi Li

TL;DR
This paper uses Singular Learning Theory to analyze grokking, explaining the transition from memorization to generalization as a basin shift in the loss landscape driven by local degeneracy.
Contribution
It introduces a basin-selection perspective on grokking, deriving LLC formulas for quadratic networks and linking LLC trajectories to generalization onset.
Findings
LLC trajectories track the onset of generalization.
Lower LLC basins are statistically preferred for generalization.
Analytic LLC formulas support the basin-shift explanation.
Abstract
Grokking, the abrupt transition from memorization to generalisation after extended training, suggests the presence of competing solution basins with distinct statistical properties. We study this phenomenon through the lens of Singular Learning Theory (SLT), a Bayesian framework that characterizes the geometry of the loss landscape. The key measure is the local learning coefficient (LLC) which quantifies the local degeneracy of the loss surface. SLT links lower-LLC basins to higher posterior mass concentration and lower expected generalisation error. Leveraging SLT, we develop a basin-selection perspective on grokking in quadratic networks: LLC ranks competing near-zero-loss basins by statistical preference, while the training-time transition between them is governed by optimisation dynamics. In this view, grokking corresponds to a transition from a higher-LLC (memorising) basin to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
