Gradient Descent Finds Global Minima of Deep Neural Networks

Simon S. Du; Jason D. Lee; Haochuan Li; Liwei Wang; Xiyu Zhai

arXiv:1811.03804·cs.LG·May 30, 2019·199 cites

Gradient Descent Finds Global Minima of Deep Neural Networks

Simon S. Du, Jason D. Lee, Haochuan Li, Liwei Wang, Xiyu Zhai

PDF

Open Access

TL;DR

This paper proves that gradient descent can find global minima in deep over-parameterized neural networks with residual connections, ensuring zero training loss in polynomial time due to the stability of the Gram matrix.

Contribution

It provides the first theoretical proof that gradient descent converges to a global minimum in deep residual neural networks with polynomial complexity.

Findings

01

Gradient descent achieves zero training loss in polynomial time for ResNets.

02

The Gram matrix structure remains stable during training.

03

Extension of results to deep residual convolutional neural networks.

Abstract

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and ELM