SGEM: stochastic gradient with energy and momentum

Hailiang Liu; Xuping Tian

arXiv:2208.02208·cs.LG·August 4, 2022

SGEM: stochastic gradient with energy and momentum

Hailiang Liu, Xuping Tian

PDF

Open Access 1 Repo

TL;DR

SGEM is a novel stochastic optimization method combining energy and momentum, offering improved convergence and stability for training deep neural networks, outperforming or matching existing methods.

Contribution

Introduces SGEM, a new stochastic gradient method that integrates energy and momentum, with proven stability and convergence properties for non-convex problems.

Findings

01

SGEM converges faster than AEGD.

02

SGEM generalizes better or as well as SGDM.

03

Provides energy-dependent convergence rates and regret bounds.

Abstract

In this paper, we propose SGEM, Stochastic Gradient with Energy and Momentum, to solve a large class of general non-convex stochastic optimization problems, based on the AEGD method that originated in the work [AEGD: Adaptive Gradient Descent with Energy. arXiv: 2010.05109]. SGEM incorporates both energy and momentum at the same time so as to inherit their dual advantages. We show that SGEM features an unconditional energy stability property, and derive energy-dependent convergence rates in the general nonconvex stochastic setting, as well as a regret bound in the online convex setting. A lower threshold for the energy variable is also provided. Our experimental results show that SGEM converges faster than AEGD and generalizes better or at least as well as SGDM in training some deep neural networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

txping/sgem
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Bandit Algorithms Research