Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks
Alexander Shevchenko, Marco Mondelli

TL;DR
This paper demonstrates that over-parameterized neural networks trained with SGD have connected loss landscapes and dropout-stable solutions, which become more pronounced as the network width increases, facilitating optimization.
Contribution
It provides theoretical proof that SGD solutions in over-parameterized networks are connected via piecewise linear paths and exhibit dropout stability, with implications for landscape topology.
Findings
SGD solutions are connected via a piecewise linear path with vanishing loss increase as neurons grow.
Dropout stability increases with network width, making solutions more robust.
Results are dimension-free for two-layer networks and scale linearly with input dimension for multilayer networks.
Abstract
The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Neural Networks and Applications
MethodsDropout · Stochastic Gradient Descent
