A Unified Analysis of Stochastic Momentum Methods for Deep Learning
Yan Yan, Tianbao Yang, Zhe Li, Qihang Lin, Yi Yang

TL;DR
This paper provides a unified theoretical framework analyzing stochastic momentum methods like SHB and SNAG in deep learning, revealing their effects on convergence and generalization, supported by empirical validation.
Contribution
It introduces a unified analysis framework for stochastic momentum methods, clarifying their convergence and stability properties in deep learning training.
Findings
SHB and SNAG do not outperform SG in convergence rates.
Momentum improves model stability and generalization.
Empirical results support theoretical insights.
Abstract
Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famous variants, i.e., the stochastic heavy-ball (SHB) method and the stochastic variant of Nesterov's accelerated gradient (SNAG) method. We propose a framework that unifies the three variants. We then derive the convergence rates of the norm of gradient for the non-convex optimization problem, and analyze the generalization performance through the uniform stability approach. Particularly, the convergence analysis of the training objective exhibits that SHB and SNAG have no advantage over SG.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications
