Breaking Data Symmetry is Needed For Generalization in Feature Learning Kernels

Marcel Tom\`as Bernal; Neil Rohit Mallinar; Mikhail Belkin

arXiv:2604.00316·stat.ML·April 2, 2026

Breaking Data Symmetry is Needed For Generalization in Feature Learning Kernels

Marcel Tom\`as Bernal, Neil Rohit Mallinar, Mikhail Belkin

PDF

TL;DR

This paper investigates how breaking data symmetry is crucial for generalization in feature learning kernels, demonstrating that the Recursive Feature Machine learns invariance groups to improve test performance.

Contribution

It reveals that breaking data symmetry enables generalization and shows how RFM recovers invariance groups to enhance learning in algebraic tasks.

Findings

01

Generalization occurs only when data symmetry is broken.

02

RFM recovers the underlying invariance group in data.

03

Learned features encode elements of the invariance group.

Abstract

Grokking occurs when a model achieves high training accuracy but generalization to unseen test points happens long after that. This phenomenon was initially observed on a class of algebraic problems, such as learning modular arithmetic (Power et al., 2022). We study grokking on algebraic tasks in a class of feature learning kernels via the Recursive Feature Machine (RFM) algorithm (Radhakrishnan et al., 2024), which iteratively updates feature matrices through the Average Gradient Outer Product (AGOP) of an estimator in order to learn task-relevant features. Our main experimental finding is that generalization occurs only when a certain symmetry in the training set is broken. Furthermore, we empirically show that RFM generalizes by recovering the underlying invariance group action inherent in the data. We find that the learned feature matrices encode specific elements of the invariance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.