The Effects of Mild Over-parameterization on the Optimization Landscape   of Shallow ReLU Neural Networks

Itay Safran; Gilad Yehudai; Ohad Shamir

arXiv:2006.01005·cs.LG·August 2, 2021·6 cites

The Effects of Mild Over-parameterization on the Optimization Landscape of Shallow ReLU Neural Networks

Itay Safran, Gilad Yehudai, Ohad Shamir

PDF

Open Access 1 Repo

TL;DR

This paper investigates how mild over-parameterization affects the optimization landscape of shallow ReLU neural networks, revealing that over-parameterization can transform local minima into saddle points, thus easing optimization.

Contribution

It provides a detailed analysis of the landscape changes due to over-parameterization in shallow ReLU networks, highlighting the transition from local convexity to saddle points.

Findings

01

Objective is strongly convex around global minima when no over-parameterization.

02

Over-parameterization destroys local convexity and related properties.

03

Adding a single neuron turns non-global minima into saddle points.

Abstract

We study the effects of mild over-parameterization on the optimization landscape of a simple ReLU neural network of the form $x \mapsto \sum_{i = 1}^{k} max {0, w_{i}^{⊤} x}$ , in a well-studied teacher-student setting where the target values are generated by the same architecture, and when directly optimizing over the population squared loss with respect to Gaussian inputs. We prove that while the objective is strongly convex around the global minima when the teacher and student networks possess the same number of neurons, it is not even \emph{locally convex} after any amount of over-parameterization. Moreover, related desirable properties (e.g., one-point strong convexity and the Polyak-{\L}ojasiewicz condition) also do not hold even locally. On the other hand, we establish that the objective remains one-point strongly convex in \emph{most} directions (suitably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ItaySafran/Overparameterization
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia?