Scaling transition from momentum stochastic gradient descent to plain   stochastic gradient descent

Kun Zeng; Jinlan Liu; Zhixia Jiang; Dongpo Xu

arXiv:2106.06753·cs.LG·June 15, 2021

Scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent

Kun Zeng, Jinlan Liu, Zhixia Jiang, Dongpo Xu

PDF

Open Access 5 Repos

TL;DR

This paper introduces TSGD, a new optimization method that transitions from momentum SGD to plain SGD, combining fast training and high accuracy with a decreasing learning rate for stable convergence.

Contribution

The paper proposes a novel scaling transition from momentum SGD to plain SGD with a linearly decreasing learning rate, enhancing training speed and accuracy.

Findings

01

Faster training speed compared to traditional SGD and momentum SGD.

02

Higher accuracy achieved with the TSGD method.

03

Improved stability during training.

Abstract

The plain stochastic gradient descent and momentum stochastic gradient descent have extremely wide applications in deep learning due to their simple settings and low computational complexity. The momentum stochastic gradient descent uses the accumulated gradient as the updated direction of the current parameters, which has a faster training speed. Because the direction of the plain stochastic gradient descent has not been corrected by the accumulated gradient. For the parameters that currently need to be updated, it is the optimal direction, and its update is more accurate. We combine the advantages of the momentum stochastic gradient descent with fast training speed and the plain stochastic gradient descent with high accuracy, and propose a scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent(TSGD) method. At the same time, a learning rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Statistical Mechanics and Entropy