The Norm-Separation Delay Law of Grokking: A First-Principles Theory of Delayed Generalization

Truong Xuan Khanh; Truong Quynh Hoa; Luu Duc Trung; Phan Thanh Duc

arXiv:2603.13331·cs.AI·May 5, 2026

The Norm-Separation Delay Law of Grokking: A First-Principles Theory of Delayed Generalization

Truong Xuan Khanh, Truong Quynh Hoa, Luu Duc Trung, Phan Thanh Duc

PDF

TL;DR

This paper presents a quantitative theory explaining the delay in grokking, showing it as a norm-driven phase transition in regularised training dynamics, with predictions validated across multiple tasks.

Contribution

It introduces the Norm-Separation Delay Law, linking grokking delay to effective contraction rate and norm ratios, supported by extensive empirical validation.

Findings

01

Grokking delay inversely scales with weight decay and learning rate.

02

Logarithmic dependence of delay on norm ratio confirmed.

03

AdamW optimizer enables grokking where SGD fails.

Abstract

Grokking -- the sudden generalisation that appears long after a model has perfectly memorised its training data -- has been widely observed but lacks a quantitative theory explaining the length of the delay. We show that grokking is a norm-driven representational phase transition in regularised training dynamics, and establish the Norm-Separation Delay Law: $T_{grok} - T_{mem} = Θ (γ_{eff}^{- 1} lo g (∥ θ_{mem} ∥^{2} /∥ θ_{post} ∥^{2}))$ , where $γ_{eff}$ is the optimiser's effective contraction rate ( $γ_{eff} = η λ$ for SGD, $γ_{eff} \geq η λ$ for AdamW). The upper bound follows from a discrete Lyapunov contraction argument; the matching lower bound from dynamical constraints of regularised first-order optimisation. Across 293 training runs spanning modular addition,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.