On the Benefit of Width for Neural Networks: Disappearance of Bad Basins

Dawei Li; Tian Ding; Ruoyu Sun

arXiv:1812.11039·cs.LG·September 3, 2021·31 cites

On the Benefit of Width for Neural Networks: Disappearance of Bad Basins

Dawei Li, Tian Ding, Ruoyu Sun

PDF

Open Access

TL;DR

This paper proves a phase transition in neural network loss landscapes, showing that increasing width eliminates sub-optimal basins, thereby explaining the optimization benefits of wide networks.

Contribution

It rigorously demonstrates the existence of a phase transition in the loss surface of neural networks as width increases, from having sub-optimal basins to none.

Findings

01

Wide networks have no sub-optimal basins in their loss surface.

02

Narrow networks can have strict local minima that are not global.

03

A phase transition occurs at a certain width threshold.

Abstract

Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove? To understand the benefit of width, it is important to identify the difference between wide and narrow networks. In this work, we prove that from narrow to wide networks, there is a phase transition from having sub-optimal basins to no sub-optimal basins. Specifically, we prove two results: on the positive side, for any continuous activation functions, the loss surface of a class of wide networks has no sub-optimal basins, where "basin" is defined as the set-wise strict local minimum; on the negative side, for a large class of networks with width below a threshold, we construct strict local minima that are not global. These two results together show the phase transition from narrow to wide networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing

MethodsAffine Coupling · Normalizing Flows