Exploring Grokking: Experimental and Mechanistic Investigations

Hu Qiye; Zhou Hao; Yu RuoXi

arXiv:2412.10898·cs.LG·December 17, 2024

Exploring Grokking: Experimental and Mechanistic Investigations

Hu Qiye, Zhou Hao, Yu RuoXi

PDF

Open Access

TL;DR

This paper investigates the grokking phenomenon in neural networks, combining extensive experiments and mechanistic analysis to understand its behavior and underlying causes during training.

Contribution

It provides a comprehensive experimental study and explores multiple perspectives on the mechanism of grokking, advancing understanding of this phenomenon.

Findings

01

Grokking involves a sharp transition from memorization to generalization.

02

Training data fraction, model choice, and optimization influence grokking behavior.

03

Multiple viewpoints on the mechanism of grokking are discussed.

Abstract

The phenomenon of grokking in over-parameterized neural networks has garnered significant interest. It involves the neural network initially memorizing the training set with zero training error and near-random test error. Subsequent prolonged training leads to a sharp transition from no generalization to perfect generalization. Our study comprises extensive experiments and an exploration of the research behind the mechanism of grokking. Through experiments, we gained insights into its behavior concerning the training data fraction, the model, and the optimization. The mechanism of grokking has been a subject of various viewpoints proposed by researchers, and we introduce some of these perspectives.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVibration and Dynamic Analysis

MethodsSparse Evolutionary Training