Grokking in the Ising Model
Karolina Hutchison, David Yevick

TL;DR
This paper investigates grokking, a delayed generalization phenomenon, in neural networks trained on the Ising model, revealing a transition to sparse subnetworks that enhance global feature recognition and generalization.
Contribution
It introduces a PCA-based analysis of grokking in neural networks and uncovers a transition to sparse subnetworks that improve generalization in the Ising model context.
Findings
Grokking involves a transition from connected to sparse subnetworks.
Sparse subnetworks reduce classification errors from multiple paths.
Final layers identify global features enabling generalization.
Abstract
Delayed generalization, termed grokking, in a machine learning calculation occurs when the increase in test accuracy is delayed relative to the training accuracy. This paper examines grokking in the context of a dense neural network trained to classify 2D Ising model configurations into 4 equally spaced energy regions in the presence of weight decay. Partially with the aid of novel PCA-based network layer analysis techniques, the observed behavior is interpreted as a transition from a connected network to a group of sparse subnetworks in which the number of active weights in each layer decreases monotonically with depth. This architecture reduces classification errors resulting from a multiplicity of paths. The final network layers, as in a convolutional neural network, sequentially identify global features of the input classes, which enables generalization to previously unseen patterns.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
