Grokking at the Edge of Linear Separability
Alon Beck, Noam Levi, Yohai Bar-Sinai

TL;DR
This paper explores the grokking phenomenon in a simple binary classification task, revealing it occurs near critical points where training dynamics linger, and draws parallels to physical critical phenomena.
Contribution
It provides the first analytical and empirical explanation of grokking in a minimal model, linking it to critical points and implicit bias of gradient descent.
Findings
Grokking occurs near the interpolation threshold in simple models.
Flat directions in the loss landscape cause prolonged training dynamics.
Grokking is related to critical phenomena observed in physical systems.
Abstract
We investigate the phenomenon of grokking -- delayed generalization accompanied by non-monotonic test loss behavior -- in a simple binary logistic classification task, for which "memorizing" and "generalizing" solutions can be strictly defined. Surprisingly, we find that grokking arises naturally even in this minimal model when the parameters of the problem are close to a critical point, and provide both empirical and analytical insights into its mechanism. Concretely, by appealing to the implicit bias of gradient descent, we show that logistic regression can exhibit grokking when the training dataset is nearly linearly separable from the origin and there is strong noise in the perpendicular directions. The underlying reason is that near the critical point, "flat" directions in the loss landscape with nearly zero gradient cause training dynamics to linger for arbitrarily long times near…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Theoretical and Computational Physics · Stochastic Gradient Optimization Techniques
