Grokking in the Ising Model

Karolina Hutchison; David Yevick

arXiv:2510.25966·cond-mat.dis-nn·February 6, 2026

Grokking in the Ising Model

Karolina Hutchison, David Yevick

PDF

TL;DR

This paper investigates grokking, a delayed generalization phenomenon, in neural networks trained on the Ising model, revealing a transition to sparse subnetworks that enhance global feature recognition and generalization.

Contribution

It introduces a PCA-based analysis of grokking in neural networks and uncovers a transition to sparse subnetworks that improve generalization in the Ising model context.

Findings

01

Grokking involves a transition from connected to sparse subnetworks.

02

Sparse subnetworks reduce classification errors from multiple paths.

03

Final layers identify global features enabling generalization.

Abstract

Delayed generalization, termed grokking, in a machine learning calculation occurs when the increase in test accuracy is delayed relative to the training accuracy. This paper examines grokking in the context of a dense neural network trained to classify 2D Ising model configurations into 4 equally spaced energy regions in the presence of weight decay. Partially with the aid of novel PCA-based network layer analysis techniques, the observed behavior is interpreted as a transition from a connected network to a group of sparse subnetworks in which the number of active weights in each layer decreases monotonically with depth. This architecture reduces classification errors resulting from a multiplicity of paths. The final network layers, as in a convolutional neural network, sequentially identify global features of the input classes, which enables generalization to previously unseen patterns.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.