SGEM: stochastic gradient with energy and momentum
Hailiang Liu, Xuping Tian

TL;DR
SGEM is a novel stochastic optimization method combining energy and momentum, offering improved convergence and stability for training deep neural networks, outperforming or matching existing methods.
Contribution
Introduces SGEM, a new stochastic gradient method that integrates energy and momentum, with proven stability and convergence properties for non-convex problems.
Findings
SGEM converges faster than AEGD.
SGEM generalizes better or as well as SGDM.
Provides energy-dependent convergence rates and regret bounds.
Abstract
In this paper, we propose SGEM, Stochastic Gradient with Energy and Momentum, to solve a large class of general non-convex stochastic optimization problems, based on the AEGD method that originated in the work [AEGD: Adaptive Gradient Descent with Energy. arXiv: 2010.05109]. SGEM incorporates both energy and momentum at the same time so as to inherit their dual advantages. We show that SGEM features an unconditional energy stability property, and derive energy-dependent convergence rates in the general nonconvex stochastic setting, as well as a regret bound in the online convex setting. A lower threshold for the energy variable is also provided. Our experimental results show that SGEM converges faster than AEGD and generalizes better or at least as well as SGDM in training some deep neural networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Bandit Algorithms Research
