Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks
Brandon Livio Annesi, Dario Bocchi, Chiara Cammarota

TL;DR
This paper investigates how overparameterization affects the loss landscape of simple neural networks, revealing BBP transitions at initialization that influence the network's ability to learn signals.
Contribution
It introduces a field-theoretic analysis of Hessian spectra showing how overparameterization shifts the BBP transition point and alters the landscape's qualitative features.
Findings
Overparameterization shifts the BBP transition point.
Discontinuous transitions exhibit strong finite-N effects.
A new lower SNR threshold for uninformative initialization is proposed.
Abstract
High-dimensional non-convex loss landscapes play a central role in the theory of Machine Learning. Gaining insight into how these landscapes interact with gradient-based optimization methods, even in relatively simple models, can shed light on this enigmatic feature of neural networks. In this work, we will focus on a prototypical simple learning problem, which generalizes the Phase Retrieval inference problem by allowing the exploration of overparametrized settings. Using techniques from field theory, we analyze the spectrum of the Hessian at initialization and identify a Baik-Ben Arous-P\'ech\'e (BBP) transition in the amount of data that separates regimes where the initialization is informative or uninformative about a planted signal of a teacher-student setup. Crucially, we demonstrate how overparameterization can bend the loss landscape, shifting the transition point, even reaching…
Peer Reviews
Decision·ICLR 2026 Oral
1. An interesting perspective on neural network training by discussing overparametrized phase retrieval - the theory involves machinery from quantum mechanics (of which I did not fully understand), which shows an interesting link between learning theory and physics. 2. The related works are cited extensively and mentioned appropriately in relevant parts of the manuscript.
1. Clarification in terminology is needed. - Why would this be a "loss landscape" result? Seems to me that the result is mostly on Hessian "at initialization" - which to me, it is not natural to understand the result as loss landscape result (of course, Hessians and loss landscape are related, but the training dynamics is not discussed). - What does "bend the landscape" mean? - What is the SNR that is repeated throughout the paper? I assume it would be alpha = M/N, I am wondering if the ter
This is a strong theoretical contribution. The idea that overparameterization changes the nature of the BBP transition—and that in the large-width limit one reaches the optimal weak-recovery threshold—is both interesting and novel.It deepens our understanding of why wide models are easier to train, and it connects two previously separate lines of work: loss-landscape curvature and spectral initialization. Original and timely topic: the interplay between overparameterization and loss-landscape g
The analysis is limited to quadratic activations, which makes it less clear how general the conclusions are. The field-theory derivations could be compressed; parts of the appendix are a bit heavy "physics-style" It would have been nice to see a direct quantitative comparison with actual spectral initialization methods to highlight practical implications.
The paper is theoretically and numerically sound. It addresses the important question of how overparameterization affects the learning landscape, offering novel, quantitative results in the specific setting of quadratic two-layer networks. Its originality lies in extending previous analyses of phase retrieval to a more general framework, providing a detailed characterization of the BBP phenomenology and valuable insights on finite-size effects. Overall, the presentation is clear and connects the
1. One main weakness is the lack of a methodological overview in the main text. The technical analysis is confined to the appendix, leaving readers without intuition about the derivation. The paper could strongly benefit from a short but insightful methodological summary in the main section, possibly by shortening the (sometimes redundant) conclusion and/or using the additional page. 2. Some relevant references on the multi-index setting are missing. For instance, the critical SNR $p_\star/2$ h
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning in Materials Science · Quantum many-body systems
