Are Saddles Good Enough for Deep Learning?
Adepu Ravi Sankar, Vineeth N Balasubramanian

TL;DR
This paper challenges the common belief that deep neural networks primarily converge to low-cost minima, proposing instead that they often settle in high-degeneracy saddle points, which has implications for training methods.
Contribution
It introduces the hypothesis that deep networks tend to converge to high-degeneracy saddle points and provides extensive experimental validation on standard datasets.
Findings
Deep networks often converge to high-degeneracy saddle points.
Recent saddle-escaping methods still tend to settle at good saddles.
Experimental results support the Wigner's Semicircle Law in neural network training.
Abstract
Recent years have seen a growing interest in understanding deep neural networks from an optimization perspective. It is understood now that converging to low-cost local minima is sufficient for such models to become effective in practice. However, in this work, we propose a new hypothesis based on recent theoretical findings and empirical studies that deep neural network models actually converge to saddle points with high degeneracy. Our findings from this work are new, and can have a significant impact on the development of gradient descent based methods for training deep networks. We validated our hypotheses using an extensive experimental evaluation on standard datasets such as MNIST and CIFAR-10, and also showed that recent efforts that attempt to escape saddles finally converge to saddles with high degeneracy, which we define as `good saddles'. We also verified the famous Wigner's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
