SGD Converges to Global Minimum in Deep Learning via Star-convex Path

Yi Zhou; Junjie Yang; Huishuai Zhang; Yingbin Liang; Vahid Tarokh

arXiv:1901.00451·cs.LG·January 3, 2019·22 cites

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh

PDF

Open Access

TL;DR

This paper proves that stochastic gradient descent (SGD) converges to a global minimum in deep learning by demonstrating its star-convex path and zero-loss property, providing a theoretical understanding of its effectiveness.

Contribution

The paper establishes the convergence of SGD to a global minimum in deep neural network training by leveraging star-convexity and zero-loss properties, offering new theoretical insights.

Findings

01

SGD follows a star-convex path during training

02

SGD converges to a global minimum in deep learning models

03

Training loss can reach near zero in practice

Abstract

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks. However, there is still a lack of understanding on how and why SGD can train these complex networks towards a global minimum. In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural network training. Our argument exploits the following two important properties: 1) the training loss can achieve zero value (approximately), which has been widely observed in deep learning; 2) SGD follows a star-convex path, which is verified by various experiments in this paper. In such a context, our analysis shows that SGD, although has long been considered as a randomized algorithm, converges in an intrinsically deterministic manner to a global minimum.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms

MethodsStochastic Gradient Descent