Progress Measures for Grokking on Real-world Tasks
Satvik Golechha

TL;DR
This paper investigates the phenomenon of grokking in real-world datasets, introducing new progress measures that better correlate with generalization than traditional weight norm metrics.
Contribution
It introduces three novel progress measures—activation sparsity, weight entropy, and local circuit complexity—that improve understanding of grokking beyond weight norms.
Findings
Grokking can occur outside the expected weight norm ranges.
New progress measures show stronger correlation with grokking than weight norms.
Weight norms are not the primary cause of grokking.
Abstract
Grokking, a phenomenon where machine learning models generalize long after overfitting, has been primarily observed and studied in algorithmic tasks. This paper explores grokking in real-world datasets using deep neural networks for classification under the cross-entropy loss. We challenge the prevalent hypothesis that the norm of weights is the primary cause of grokking by demonstrating that grokking can occur outside the expected range of weight norms. To better understand grokking, we introduce three new progress measures: activation sparsity, absolute weight entropy, and approximate local circuit complexity. These measures are conceptually related to generalization and demonstrate a stronger correlation with grokking in real-world datasets compared to weight norms. Our findings suggest that while weight norms might usually correlate with grokking and our progress measures,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition
