A Basin-Selection Perspective on Grokking via Singular Learning Theory

Ben Cullen; Sergio Estan-Ruiz; Riya Danait; Jiayi Li

arXiv:2603.01192·stat.ML·May 8, 2026

A Basin-Selection Perspective on Grokking via Singular Learning Theory

Ben Cullen, Sergio Estan-Ruiz, Riya Danait, Jiayi Li

PDF

TL;DR

This paper uses Singular Learning Theory to analyze grokking, explaining the transition from memorization to generalization as a basin shift in the loss landscape driven by local degeneracy.

Contribution

It introduces a basin-selection perspective on grokking, deriving LLC formulas for quadratic networks and linking LLC trajectories to generalization onset.

Findings

01

LLC trajectories track the onset of generalization.

02

Lower LLC basins are statistically preferred for generalization.

03

Analytic LLC formulas support the basin-shift explanation.

Abstract

Grokking, the abrupt transition from memorization to generalisation after extended training, suggests the presence of competing solution basins with distinct statistical properties. We study this phenomenon through the lens of Singular Learning Theory (SLT), a Bayesian framework that characterizes the geometry of the loss landscape. The key measure is the local learning coefficient (LLC) which quantifies the local degeneracy of the loss surface. SLT links lower-LLC basins to higher posterior mass concentration and lower expected generalisation error. Leveraging SLT, we develop a basin-selection perspective on grokking in quadratic networks: LLC ranks competing near-zero-loss basins by statistical preference, while the training-time transition between them is governed by optimisation dynamics. In this view, grokking corresponds to a transition from a higher-LLC (memorising) basin to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.