The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
Enric Boix-Adsera, Neil Mallinar, James B. Simon, Mikhail Belkin

TL;DR
This paper introduces the Features at Convergence Theorem (FACT), a first-principles theoretical framework that explains neural network feature learning, aligning with empirical observations and addressing limitations of the Neural Feature Ansatz (NFA).
Contribution
It derives a new theorem from first-order optimality conditions that better explains feature learning at convergence, providing a rigorous alternative to the NFA.
Findings
FACT aligns with learned features at convergence
Explains why NFA holds in most settings
Captures phenomena like grokking and phase transitions
Abstract
It is a central challenge in deep learning to understand how neural networks learn representations. A leading approach is the Neural Feature Ansatz (NFA) (Radhakrishnan et al. 2024), a conjectured mechanism for how feature learning occurs. Although the NFA is empirically validated, it is an educated guess and lacks a theoretical basis, and thus it is unclear when it might fail, and how to improve it. In this paper, we take a first-principles approach to understanding why this observation holds, and when it does not. We use first-order optimality conditions to derive the Features at Convergence Theorem (FACT), an alternative to the NFA that (a) obtains greater agreement with learned features at convergence, (b) explains why the NFA holds in most settings, and (c) captures essential feature learning phenomena in neural networks such as grokking behavior in modular arithmetic and phase…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
