Benignity of loss landscape with weight decay requires both large overparametrization and initialization
Etienne Boursier, Matthew Bowditch, Matthias Englert, Ranko Lazic

TL;DR
This paper demonstrates that for two-layer ReLU networks with weight decay, a benign loss landscape free of spurious minima occurs only under large overparameterization and initialization, highlighting the importance of these factors.
Contribution
It provides a theoretical analysis showing the conditions under which the loss landscape becomes benign in regularized neural networks, emphasizing the roles of overparameterization and initialization.
Findings
Benign landscape occurs with large overparametrization when m ≥ min(n^d, 2^n).
Almost all constant activation regions contain a global minimum under these conditions.
Small initializations can still lead to spurious minima despite benign landscapes at large initializations.
Abstract
The optimization of neural networks under weight decay remains poorly understood from a theoretical standpoint. While weight decay is standard practice in modern training procedures, most theoretical analyses focus on unregularized settings. In this work, we investigate the loss landscape of the -regularized training loss for two-layer ReLU networks. We show that the landscape becomes benign -- i.e., free of spurious local minima -- under large overparametrization, specifically when the network width satisfies , where is the number of data points and the input dimension. More precisely in this regime, almost all constant activation regions contain a global minimum and no spurious local minima. We further show that this level of overparametrization is not only sufficient but also necessary via the example of orthogonal data. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Model Reduction and Neural Networks
