Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Jaerin Lee, Bong Gyun Kang, Kihoon Kim, Kyoung Mu Lee

TL;DR
This paper introduces a method to accelerate the grokking phenomenon in machine learning by amplifying slow-varying gradient components, significantly reducing the delay in generalization across various tasks.
Contribution
The authors propose a spectral decomposition-based technique to amplify slow-varying gradients, accelerating grokking by over 50 times with minimal code modifications.
Findings
Achieved over 50x acceleration of grokking in experiments.
Applicable to diverse tasks including images, languages, and graphs.
Simple implementation with only a few lines of code.
Abstract
One puzzling artifact in machine learning dubbed grokking is where delayed generalization is achieved tenfolds of iterations after near perfect overfitting to the training data. Focusing on the long delay itself on behalf of machine learning practitioners, our goal is to accelerate generalization of a model under grokking phenomenon. By regarding a series of gradients of a parameter over training iterations as a random signal over time, we can spectrally decompose the parameter trajectories under gradient descent into two components: the fast-varying, overfitting-yielding component and the slow-varying, generalization-inducing component. This analysis allows us to accelerate the grokking phenomenon more than with only a few lines of code that amplifies the slow-varying components of gradients. The experiments show that our algorithm applies to diverse tasks involving images,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques
