Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size
Kento Imaizumi, Hideaki Iiduka

TL;DR
This paper analyzes the convergence properties of quasi-hyperbolic momentum (QHM) in stochastic nonconvex optimization, demonstrating that increasing batch size without decaying learning rate can improve neural network training.
Contribution
It provides the first combined asymptotic and non-asymptotic convergence analysis of mini-batch QHM with increasing batch size, highlighting practical training strategies.
Findings
Increasing batch size improves convergence in neural network training.
Decaying learning rate is necessary for asymptotic convergence.
Finite increases in batch size can enhance training efficiency.
Abstract
Momentum methods were originally introduced for their superiority to stochastic gradient descent (SGD) in deterministic settings with convex objective functions. However, despite their widespread application to deep neural networks -- a representative case of stochastic nonconvex optimization -- the theoretical justification for their effectiveness in such settings remains limited. Quasi-hyperbolic momentum (QHM) is an algorithm that generalizes various momentum methods and has been studied to better understand the class of momentum-based algorithms as a whole. In this paper, we provide both asymptotic and non-asymptotic convergence results for mini-batch QHM with an increasing batch size. We show that achieving asymptotic convergence requires either a decaying learning rate or an increasing batch size. Since a decaying learning rate adversely affects non-asymptotic convergence, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Mathematical Theories and Applications · advanced mathematical theories · Advanced Thermodynamics and Statistical Mechanics
