Non-attracting Regions of Local Minima in Deep and Wide Neural Networks
Henning Petzka, Cristian Sminchisescu

TL;DR
This paper investigates the loss surface of deep neural networks, showing that suboptimal local minima exist but are connected and can be escaped, explaining why wide networks often perform well despite local minima.
Contribution
It constructs examples of suboptimal local minima in neural networks and analyzes their connectedness and conditions leading to saddle points, advancing understanding of loss landscapes.
Findings
Suboptimal local minima exist in neural networks with sigmoid activation.
These minima form connected sets that can be escaped via non-increasing paths.
Conditions are characterized under which minima become saddle points.
Abstract
Understanding the loss surface of neural networks is essential for the design of models with predictable performance and their success in applications. Experimental results suggest that sufficiently deep and wide neural networks are not negatively impacted by suboptimal local minima. Despite recent progress, the reason for this outcome is not fully understood. Could deep networks have very few, if at all, suboptimal local optima? or could all of them be equally good? We provide a construction to show that suboptimal local minima (i.e., non-global ones), even though degenerate, exist for fully connected neural networks with sigmoid activation functions. The local minima obtained by our construction belong to a connected set of local solutions that can be escaped from via a non-increasing path on the loss curve. For extremely wide neural networks of decreasing width after the wide layer,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Sparse and Compressive Sensing Techniques
MethodsSigmoid Activation
