Grokfast: Accelerated Grokking by Amplifying Slow Gradients

Jaerin Lee; Bong Gyun Kang; Kihoon Kim; Kyoung Mu Lee

arXiv:2405.20233·cs.LG·June 6, 2024·1 cites

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

Jaerin Lee, Bong Gyun Kang, Kihoon Kim, Kyoung Mu Lee

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a method to accelerate the grokking phenomenon in machine learning by amplifying slow-varying gradient components, significantly reducing the delay in generalization across various tasks.

Contribution

The authors propose a spectral decomposition-based technique to amplify slow-varying gradients, accelerating grokking by over 50 times with minimal code modifications.

Findings

01

Achieved over 50x acceleration of grokking in experiments.

02

Applicable to diverse tasks including images, languages, and graphs.

03

Simple implementation with only a few lines of code.

Abstract

One puzzling artifact in machine learning dubbed grokking is where delayed generalization is achieved tenfolds of iterations after near perfect overfitting to the training data. Focusing on the long delay itself on behalf of machine learning practitioners, our goal is to accelerate generalization of a model under grokking phenomenon. By regarding a series of gradients of a parameter over training iterations as a random signal over time, we can spectrally decompose the parameter trajectories under gradient descent into two components: the fast-varying, overfitting-yielding component and the slow-varying, generalization-inducing component. This analysis allows us to accelerate the grokking phenomenon more than $\times 50$ with only a few lines of code that amplifies the slow-varying components of gradients. The experiments show that our algorithm applies to diverse tasks involving images,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ironjr/grokfast
pytorchOfficial

Models

🤗
neoncortex/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast
model· 10 dl· ♡ 1
10 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques