Dimensional Criticality at Grokking Across MLPs and Transformers
Ping Wang

TL;DR
This paper introduces a novel avalanche probe method to analyze the dynamical transition in neural networks during grokking, revealing task-dependent critical behavior and macroscopic signatures of the generalization transition.
Contribution
The study presents TDU–OFC, a new offline avalanche probe that uncovers task-dependent critical dynamics and macroscopic signatures of grokking in neural networks.
Findings
Discovered a localized crossing of the cascade dimension D(t) at the generalization transition.
Observed opposite crossing directions for modular addition and XOR tasks, indicating shared criticality.
Avalanche distributions show heavy tails and finite-size scaling consistent with critical phenomena.
Abstract
Abrupt transitions between distinct dynamical regimes are a hallmark of complex systems. Grokking in deep neural networks provides a striking example -- an abrupt transition from memorization to generalization long after training accuracy saturates -- yet robust macroscopic signatures of this transition remain elusive. Here we introduce \textbf{TDU--OFC} (Thresholded Diffusion Update--Olami-Feder-Christensen), an offline avalanche probe that converts gradient snapshots into cascade statistics and extracts a \emph{macroscopic observable} -- the time-resolved effective cascade dimension -- via grokking-aligned finite-size scaling. Across Transformers trained on modular addition and MLPs trained on XOR, we discover a localized dynamical crossing of the Gaussian diffusion baseline precisely at the generalization transition. The crossing direction is task-dependent: modular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
