Unified Convergence Analysis of Stochastic Momentum Methods for Convex   and Non-convex Optimization

Tianbao Yang; Qihang Lin; Zhe Li

arXiv:1604.03257·math.OC·May 6, 2016·90 cites

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

Tianbao Yang, Qihang Lin, Zhe Li

PDF

Open Access

TL;DR

This paper provides a unified convergence analysis framework for stochastic momentum methods in convex and non-convex optimization, offering insights into their similarities, differences, and practical performance in deep learning.

Contribution

It introduces a unified convergence analysis framework for stochastic momentum methods, bridging theory and practice, and compares their empirical performance in deep neural network training.

Findings

01

Unified framework reveals similarities and differences among stochastic momentum methods.

02

The analysis explains the transition from gradient to Nesterov's and heavy-ball methods.

03

Empirical results show Nesterov's method balances convergence speed and robustness.

Abstract

Recently, {\it stochastic momentum} methods have been widely adopted in training deep neural networks. However, their convergence analysis is still underexplored at the moment, in particular for non-convex optimization. This paper fills the gap between practice and theory by developing a basic convergence analysis of two stochastic momentum methods, namely stochastic heavy-ball method and the stochastic variant of Nesterov's accelerated gradient method. We hope that the basic convergence results developed in this paper can serve the reference to the convergence of stochastic momentum methods and also serve the baselines for comparison in future development of stochastic momentum methods. The novelty of convergence analysis presented in this paper is a unified framework, revealing more insights about the similarities and differences between different stochastic momentum methods and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings