Grokking and Generalization Collapse: Insights from \texttt{HTSR} theory
Hari K. Prakash, Charles H. Martin

TL;DR
This paper investigates the grokking phenomenon in neural networks, revealing a late-stage anti-grokking phase characterized by accuracy collapse, and demonstrates that the HTSR layer quality metric alpha effectively detects all phases of training and overfitting.
Contribution
The study introduces the concept of anti-grokking, identifies HTSR's alpha metric as a universal indicator of training phases, and provides a new way to detect overfitting without test data.
Findings
Anti-grokking occurs late in training with accuracy collapse.
HTSR alpha metric detects all grokking phases and overfitting.
Correlation Traps signal overfitting and are identified via spectral analysis.
Abstract
We study the well-known grokking phenomena in neural networks (NNs) using a 3-layer MLP trained on 1 k-sample subset of MNIST, with and without weight decay, and discover a novel third phase -- \emph{anti-grokking} -- that occurs very late in training and resembles but is distinct from the familiar \emph{pre-grokking} phases: test accuracy collapses while training accuracy stays perfect. This late-stage collapse is distinct, from the known pre-grokking and grokking phases, and is not detected by other proposed grokking progress measures. Leveraging Heavy-Tailed Self-Regularization HTSR through the open-source WeightWatcher tool, we show that the HTSR layer quality metric alone delineates all three phases, whereas the best competing metrics detect only the first two. The \emph{anti-grokking} is revealed by training for and is invariably heralded by and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning in Materials Science · Neural Networks and Reservoir Computing
