Grokking as a Variance-Limited Phase Transition: Spectral Gating and the Epsilon-Stability Threshold
Pratyush Acharya, Habish Dhakal

TL;DR
This paper models grokking as a phase transition governed by spectral gating and variance regulation in AdamW optimizer dynamics, revealing the conditions under which generalization emerges long after training convergence.
Contribution
It introduces a spectral gating mechanism explaining grokking, highlighting the role of variance and stability thresholds in optimizer dynamics, and challenges existing hypotheses about flat minima.
Findings
Grokking occurs when gradient variance surpasses a stability threshold.
AdamW's anisotropic noise directs generalization in a spectral gating regime.
Three complexity regimes influence the learning dynamics and grokking onset.
Abstract
Standard optimization theories struggle to explain grokking, where generalization occurs long after training convergence. While geometric studies attribute this to slow drift, they often overlook the interaction between the optimizer's noise structure and landscape curvature. This work analyzes AdamW dynamics on modular arithmetic tasks, revealing a ``Spectral Gating'' mechanism that regulates the transition from memorization to generalization. We find that AdamW operates as a variance-gated stochastic system. Grokking is constrained by a stability condition: the generalizing solution resides in a sharp basin () initially inaccessible under low-variance regimes. The ``delayed'' phase represents the accumulation of gradient variance required to lift the effective stability ceiling, permitting entry into this sharp manifold. Our ablation studies identify three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Metaheuristic Optimization Algorithms Research · Quantum Computing Algorithms and Architecture
