Grokking as Dimensional Phase Transition in Neural Networks
Ping Wang

TL;DR
This paper models neural network grokking as a dimensional phase transition, revealing how effective dimensionality crossing relates to generalization onset and gradient field geometry.
Contribution
It introduces a finite-size scaling analysis of gradient avalanche dynamics, identifying grokking as a phase transition driven by gradient field geometry rather than architecture.
Findings
Grokking corresponds to a dimensional phase transition with effective dimensionality crossing from sub- to super-diffusive.
Gradient field geometry, not network architecture, determines the effective dimensionality D.
Synthetic Gaussian gradients maintain D ≈ 1, while real training shows excess D from backpropagation correlations.
Abstract
Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \textit{dimensional phase transition}: effective dimensionality~ crosses from sub-diffusive (subcritical, ) to super-diffusive (supercritical, ) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, reflects \textbf{gradient field geometry}, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
