Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
Pascal Jr. Tikeng Notsawo, Hattie Zhou, Mohammad Pezeshki and, Irina Rish, Guillaume Dumas

TL;DR
This paper introduces a low-cost method to predict the occurrence of grokking in neural networks by analyzing early learning curve oscillations using Fourier transforms, enabling early detection without extensive training.
Contribution
It proposes a novel spectral analysis approach to predict grokking from initial epochs, reducing the need for long training runs to identify hyper-parameter conditions.
Findings
Early epoch oscillations correlate with later grokking occurrence
Spectral signature analysis can predict grokking with high accuracy
Loss landscape analysis explains the origin of oscillations
Abstract
This paper focuses on predicting the occurrence of grokking in neural networks, a phenomenon in which perfect generalization emerges long after signs of overfitting or memorization are observed. It has been reported that grokking can only be observed with certain hyper-parameters. This makes it critical to identify the parameters that lead to grokking. However, since grokking occurs after a large number of epochs, searching for the hyper-parameters that lead to it is time-consuming. In this paper, we propose a low-cost method to predict grokking without training for a large number of epochs. In essence, by studying the learning curve of the first few epochs, we show that one can predict whether grokking will occur later on. Specifically, if certain oscillations occur in the early epochs, one can expect grokking to occur if the model is trained for a much longer period of time. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
