Exploring Grokking: Experimental and Mechanistic Investigations
Hu Qiye, Zhou Hao, Yu RuoXi

TL;DR
This paper investigates the grokking phenomenon in neural networks, combining extensive experiments and mechanistic analysis to understand its behavior and underlying causes during training.
Contribution
It provides a comprehensive experimental study and explores multiple perspectives on the mechanism of grokking, advancing understanding of this phenomenon.
Findings
Grokking involves a sharp transition from memorization to generalization.
Training data fraction, model choice, and optimization influence grokking behavior.
Multiple viewpoints on the mechanism of grokking are discussed.
Abstract
The phenomenon of grokking in over-parameterized neural networks has garnered significant interest. It involves the neural network initially memorizing the training set with zero training error and near-random test error. Subsequent prolonged training leads to a sharp transition from no generalization to perfect generalization. Our study comprises extensive experiments and an exploration of the research behind the mechanism of grokking. Through experiments, we gained insights into its behavior concerning the training data fraction, the model, and the optimization. The mechanism of grokking has been a subject of various viewpoints proposed by researchers, and we introduce some of these perspectives.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVibration and Dynamic Analysis
MethodsSparse Evolutionary Training
