Grokking phase transitions in learning local rules with gradient descent

Bojan \v{Z}unkovi\v{c}; Enej Ilievski

arXiv:2210.15435·cond-mat.stat-mech·October 28, 2022

Grokking phase transitions in learning local rules with gradient descent

Bojan \v{Z}unkovi\v{c}, Enej Ilievski

PDF

Open Access

TL;DR

This paper models grokking as a phase transition in rule learning, providing exact analytic expressions for critical phenomena, and connects it with statistical learning theory through tensor networks.

Contribution

It introduces a solvable model of grokking, deriving critical exponents and connecting grokking to locality in teacher models using tensor networks.

Findings

01

Grokking is characterized as a phase transition with specific critical exponents.

02

Analytic expressions for grokking probability and time distribution are derived.

03

Numerical analysis of cellular automata confirms theoretical predictions.

Abstract

We discuss two solvable grokking (generalisation beyond overfitting) models in a rule learning scenario. We show that grokking is a phase transition and find exact analytic expressions for the critical exponents, grokking probability, and grokking time distribution. Further, we introduce a tensor-network map that connects the proposed grokking setup with the standard (perceptron) statistical learning theory and show that grokking is a consequence of the locality of the teacher model. As an example, we analyse the cellular automata learning task, numerically determine the critical exponent and the grokking time distributions and compare them with the prediction of the proposed grokking model. Finally, we numerically analyse the connection between structure formation and grokking.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Cellular Automata and Applications · Quantum many-body systems