How Can Deep Neural Networks Fail Even With Global Optima?
Qingguang Guan

TL;DR
This paper investigates why deep neural networks can fail despite reaching global optima, showing that overfitting models with global minima can still perform poorly on classification and approximation tasks.
Contribution
It extends the expressive power of shallow networks to deep ones and constructs overfitting deep networks that fail despite having global optima.
Findings
Overfitting deep networks can still perform poorly.
Global optima do not guarantee good generalization.
Theoretical analysis supports empirical results.
Abstract
Fully connected deep neural networks are successfully applied to classification and function approximation problems. By minimizing the cost function, i.e., finding the proper weights and biases, models can be built for accurate predictions. The ideal optimization process can achieve global optima. However, do global optima always perform well? If not, how bad can it be? In this work, we aim to: 1) extend the expressive power of shallow neural networks to networks of any depth using a simple trick, 2) construct extremely overfitting deep neural networks that, despite having global optima, still fail to perform well on classification and function approximation problems. Different types of activation functions are considered, including ReLU, Parametric ReLU, and Sigmoid functions. Extensive theoretical analysis has been conducted, ranging from one-dimensional models to models of any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia?
