Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking
Yongzhong Xu

TL;DR
This empirical study investigates the feature repulsion mechanism in two-layer networks during grokking, revealing activation-dependent spectral signatures and confirming theoretical predictions about feature separation.
Contribution
It provides the first empirical validation of Tian's feature repulsion theorem and explores how activation functions influence spectral signatures during grokking.
Findings
Feature repulsion sign rule holds across multiple seeds and activation functions.
Spectral signatures in parameter updates depend critically on activation derivative.
Grokking correlates with a rank-2 spectrum in the parameter update matrix.
Abstract
Tian (2025) proves a repulsion theorem (Theorem 6) for the matrix during the interactive feature-learning stage of grokking: similar features have negative off-diagonal entries , producing an effective repulsive force that drives them apart. However, the theorem does not specify when this mechanism becomes empirically observable, nor whether it leaves a measurable spectral signature in the parameter updates. We test this directly on Tian's modular addition setup (, , MSE loss) and observe a clear structure-mechanism dissociation. The predicted sign rule holds robustly on the top-200 most-similar feature pairs across activations (empirical sign-match rising from 0.865 to 0.985 on across 5 seeds, and saturating at 1.000 on ). However, the spectral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
