Grokking Beyond the Euclidean Norm of Model Parameters

Pascal Jr Tikeng Notsawo; Guillaume Dumas; Guillaume Rabusseau

arXiv:2506.05718·cs.LG·July 14, 2025

Grokking Beyond the Euclidean Norm of Model Parameters

Pascal Jr Tikeng Notsawo, Guillaume Dumas, Guillaume Rabusseau

PDF

1 Video

TL;DR

This paper investigates how regularization influences grokking, revealing that explicit or implicit regularization can induce delayed generalization in neural networks, especially with over-parameterization and data selection effects.

Contribution

It demonstrates that regularization targeting specific properties can induce grokking, and over-parameterization enables grokking without explicit regularization, challenging traditional norms as proxies for generalization.

Findings

01

Regularization of property P induces grokking.

02

Over-parameterization enables grokking without explicit regularization.

03

L2 norm is unreliable as a proxy for generalization.

Abstract

Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. In this work, we demonstrate that grokking can be induced by regularization, either explicit or implicit. More precisely, we show that when there exists a model with a property $P$ (e.g., sparse or low-rank weights) that generalizes on the problem of interest, gradient descent with a small but non-zero regularization of $P$ (e.g., $ℓ_{1}$ or nuclear norm regularization) results in grokking. This extends previous work showing that small non-zero weight decay induces grokking. Moreover, our analysis shows that over-parameterization by adding depth makes it possible to grok or ungrok without explicitly using regularization, which is impossible in shallow cases. We further show that the $ℓ_{2}$ norm is not a reliable proxy for generalization when the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Grokking Beyond the Euclidean Norm of Model Parameters· slideslive