First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation

Truong Xuan Khanh; Truong Quynh Hoa; Luu Duc Trung; Phan Thanh Duc

arXiv:2605.18845·cs.LG·May 20, 2026

First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation

Truong Xuan Khanh, Truong Quynh Hoa, Luu Duc Trung, Phan Thanh Duc

PDF

TL;DR

This paper presents a quantitative law predicting grokking delay under AdamW, validated across multiple architectures and tasks, with causal interventions confirming the theoretical insights.

Contribution

It introduces the first closed-form prediction of grokking delay as a first-passage time, incorporating AdamW corrections and causal validation.

Findings

01

Calibrated law predicts grokking delay with 17.7% MAPE on held-out runs.

02

Law generalizes to MLPs with 18.0% MAPE and cross-task with 23.3% error.

03

Causal interventions confirm the importance of norm and angular reachability in grokking.

Abstract

We give the first quantitative prediction of grokking delay under AdamW. Treating the delay as a first-passage time, we derive a closed-form law T_grok - T_mem = (1 / 2 kappa_LL eta lambda) log(V_mem / V_star), where V_t = ||theta_t||^2 is the squared parameter norm, V_star is an architecture-dependent threshold, and kappa_LL absorbs the AdamW correction to the clean-SGD contraction rate 2 eta lambda. Calibrating (kappa_LL, V_star) on a single hyperparameter cell predicts grokking delays on 26 held-out runs with MAPE 17.7% over a 41x delay range; the law generalises to MLPs (MAPE 18.0%, N=34) and degrades to 23.3% on cross-task extension (N=46, 43.5x range), with a structured residual in which V_star / V_mem stays comparatively stable within architecture (CV about 14% on the 1L transformer). First-passage of V_t is necessary but not sufficient. A quantile-margin theorem establishes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.