Grokking phase transitions in learning local rules with gradient descent
Bojan \v{Z}unkovi\v{c}, Enej Ilievski

TL;DR
This paper models grokking as a phase transition in rule learning, providing exact analytic expressions for critical phenomena, and connects it with statistical learning theory through tensor networks.
Contribution
It introduces a solvable model of grokking, deriving critical exponents and connecting grokking to locality in teacher models using tensor networks.
Findings
Grokking is characterized as a phase transition with specific critical exponents.
Analytic expressions for grokking probability and time distribution are derived.
Numerical analysis of cellular automata confirms theoretical predictions.
Abstract
We discuss two solvable grokking (generalisation beyond overfitting) models in a rule learning scenario. We show that grokking is a phase transition and find exact analytic expressions for the critical exponents, grokking probability, and grokking time distribution. Further, we introduce a tensor-network map that connects the proposed grokking setup with the standard (perceptron) statistical learning theory and show that grokking is a consequence of the locality of the teacher model. As an example, we analyse the cellular automata learning task, numerically determine the critical exponent and the grokking time distributions and compare them with the prediction of the proposed grokking model. Finally, we numerically analyse the connection between structure formation and grokking.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Cellular Automata and Applications · Quantum many-body systems
