Grokked Models are Better Unlearners
Yuanbang Liang, Yang Li

TL;DR
This paper demonstrates that models which have undergone grokking exhibit superior unlearning capabilities, enabling more efficient and stable removal of specific data influences without full retraining.
Contribution
It shows that applying unlearning after grokking improves efficiency, stability, and reduces collateral damage across vision and language models.
Findings
Post-grokking models require fewer updates to forget data.
Models after grokking experience less collateral damage.
Features analysis indicates more modular representations post-grokking.
Abstract
Grokking-delayed generalization that emerges well after a model has fit the training data-has been linked to robustness and representation quality. We ask whether this training regime also helps with machine unlearning, i.e., removing the influence of specified data without full retraining. We compare applying standard unlearning methods before versus after the grokking transition across vision (CNNs/ResNets on CIFAR, SVHN, and ImageNet) and language (a transformer on a TOFU-style setup). Starting from grokked checkpoints consistently yields (i) more efficient forgetting (fewer updates to reach a target forget level), (ii) less collateral damage (smaller drops on retained and test performance), and (iii) more stable updates across seeds, relative to early-stopped counterparts under identical unlearning algorithms. Analyses of features and curvature further suggest that post-grokking…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
* The paper reveals a novel and interesting observation, which is that grokking can result in a model with better unlearning abilities. * Evaluation is done on both vision and language domain, and several forgetting algorithms are included.
* As the authors have acknowledged in the paper, the experiments are limited to CIFAR10/100 and ResNet/CNNs in vision, and one LLM on TOFU in the language domain. Since this paper focuses on empirical results, more datasets (such as ImageNet-100) and architectures need to be evaluated to verify the consistency of the current findings. * The authors provide an analysis on gradient correlation and local complexity to explain the findings. However, a theoretical analysis on why grokked models sho
- Bridges two important phenomena—grokking and machine unlearning—with clear empirical results, offering an important insight into creating robust unlearning. - The core finding is consistent across diverse models and data modalities (vision and language), significantly increasing confidence in the generality of the approach. - The paper is written well and easy to follow. - They demonstrate consistent emprical performance across Unlearning methods post grokking.
- The paper lacks a deep explanation of why post-grok models unlearn better. It is unclear if the improved unlearning efficiency is due to structural changes in the representation space (e.g., features and localized memory influence) or changes in the loss landscape (e.g., a transition to a wider, shallower basin). - Grokking requires significant additional compute time (extended training) well past the point of initial data fitting. Without a quantitative analysis, the computational cost of re
- The paper attempts to connect two emerging phenomena—grokking and machine unlearning—offering a novel angle on representation learning dynamics. - The observation on grokked model may provide some insight on representations that what would help the machine unlearning.
1. The problem setting is fundamentally impractical. Machine unlearning is motivated by real-world scenarios where a pretrained model must selectively forget data upon request (e.g., GDPR). However, the paper assumes access to a post-grokking checkpoint of the same model, which is not feasible in practice. Grokking typically requires extreme overfitting (e.g., 50k+ epochs on CIFAR-10), far beyond standard pretraining regimes. There is no mechanism proposed to induce grokking-like features in p
The idea of investigating the effects of grokking on unlearning is highly interesting and provides novel insights in how seemingly unrelated aspects of training can have a massive impact on post-hoc operations.
W1 Please check for missing references/typos e.g., As illustrated in Figure ?? on page 1 W2 The selection of unlearning methods is quite outdated (e.g., Fisher from 2020 instead of any newer similar methods, and both student-teacher models from 2023) W3 Unlearning scenarios are only full class for section 3.1. It would be especially interesting to see how results change in subclass and random selection scenarios which are trickier to unlearn (as done in prior works of Chundawat etc.). W4 Is i
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
