Landscape Connectivity and Dropout Stability of SGD Solutions for   Over-parameterized Neural Networks

Alexander Shevchenko; Marco Mondelli

arXiv:1912.10095·cs.LG·July 24, 2020·6 cites

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

Alexander Shevchenko, Marco Mondelli

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that over-parameterized neural networks trained with SGD have connected loss landscapes and dropout-stable solutions, which become more pronounced as the network width increases, facilitating optimization.

Contribution

It provides theoretical proof that SGD solutions in over-parameterized networks are connected via piecewise linear paths and exhibit dropout stability, with implications for landscape topology.

Findings

01

SGD solutions are connected via a piecewise linear path with vanishing loss increase as neurons grow.

02

Dropout stability increases with network width, making solutions more robust.

03

Results are dimension-free for two-layer networks and scale linearly with input dimension for multilayer networks.

Abstract

The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Neural Networks and Applications

MethodsDropout · Stochastic Gradient Descent