Grokking as Dimensional Phase Transition in Neural Networks

Ping Wang

arXiv:2604.04655·cs.LG·April 7, 2026

Grokking as Dimensional Phase Transition in Neural Networks

Ping Wang

PDF

TL;DR

This paper models neural network grokking as a dimensional phase transition, revealing how effective dimensionality crossing relates to generalization onset and gradient field geometry.

Contribution

It introduces a finite-size scaling analysis of gradient avalanche dynamics, identifying grokking as a phase transition driven by gradient field geometry rather than architecture.

Findings

01

Grokking corresponds to a dimensional phase transition with effective dimensionality crossing from sub- to super-diffusive.

02

Gradient field geometry, not network architecture, determines the effective dimensionality D.

03

Synthetic Gaussian gradients maintain D ≈ 1, while real training shows excess D from backpropagation correlations.

Abstract

Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \textit{dimensional phase transition}: effective dimensionality~ $D$ crosses from sub-diffusive (subcritical, $D < 1$ ) to super-diffusive (supercritical, $D > 1$ ) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, $D$ reflects \textbf{gradient field geometry}, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain $D \approx 1$ regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized $D (t)$ crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.