Grokking Explained: A Statistical Phenomenon
Breno W. Carvalho, Artur S. d'Avila Garcez, Lu\'is C. Lamb and, Em\'ilio Vital Brazil

TL;DR
This paper investigates the grokking phenomenon in deep learning, revealing that distribution shifts between training and test data are key, and introduces synthetic datasets to analyze its causes and mechanisms.
Contribution
It formalizes grokking, demonstrates its relation to distribution shifts, and shows it can occur with dense data and minimal tuning, advancing understanding of this phenomenon.
Findings
Grokking is linked to distribution shifts between training and test data.
Small-sampling facilitates grokking but is not its primary cause.
Grokking can occur with dense data and minimal hyper-parameter tuning.
Abstract
Grokking, or delayed generalization, is an intriguing learning phenomenon where test set loss decreases sharply only after a model's training set loss has converged. This challenges conventional understanding of the training dynamics in deep learning networks. In this paper, we formalize and investigate grokking, highlighting that a key factor in its emergence is a distribution shift between training and test data. We introduce two synthetic datasets specifically designed to analyze grokking. One dataset examines the impact of limited sampling, and the other investigates transfer learning's role in grokking. By inducing distribution shifts through controlled imbalanced sampling of sub-categories, we systematically reproduce the phenomenon, demonstrating that while small-sampling is strongly associated with grokking, it is not its cause. Instead, small-sampling serves as a convenient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis
MethodsSparse Evolutionary Training
