The Effects of Mild Over-parameterization on the Optimization Landscape of Shallow ReLU Neural Networks
Itay Safran, Gilad Yehudai, Ohad Shamir

TL;DR
This paper investigates how mild over-parameterization affects the optimization landscape of shallow ReLU neural networks, revealing that over-parameterization can transform local minima into saddle points, thus easing optimization.
Contribution
It provides a detailed analysis of the landscape changes due to over-parameterization in shallow ReLU networks, highlighting the transition from local convexity to saddle points.
Findings
Objective is strongly convex around global minima when no over-parameterization.
Over-parameterization destroys local convexity and related properties.
Adding a single neuron turns non-global minima into saddle points.
Abstract
We study the effects of mild over-parameterization on the optimization landscape of a simple ReLU neural network of the form , in a well-studied teacher-student setting where the target values are generated by the same architecture, and when directly optimizing over the population squared loss with respect to Gaussian inputs. We prove that while the objective is strongly convex around the global minima when the teacher and student networks possess the same number of neurons, it is not even \emph{locally convex} after any amount of over-parameterization. Moreover, related desirable properties (e.g., one-point strong convexity and the Polyak-{\L}ojasiewicz condition) also do not hold even locally. On the other hand, we establish that the objective remains one-point strongly convex in \emph{most} directions (suitably…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia?
