Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent

Bao Wang; Tan M. Nguyen; Andrea L. Bertozzi; Richard G. Baraniuk,; Stanley J. Osher

arXiv:2002.10583·cs.LG·April 28, 2020·5 cites

Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent

Bao Wang, Tan M. Nguyen, Andrea L. Bertozzi, Richard G. Baraniuk,, Stanley J. Osher

PDF

Open Access 1 Repo

TL;DR

This paper introduces Scheduled Restart SGD (SRSGD), a novel optimization scheme that enhances convergence and generalization in deep neural network training by combining NAG-style momentum with periodic resets, outperforming standard SGD.

Contribution

SRSGD is a new NAG-inspired method that stabilizes increasing momentum through scheduled resets, leading to faster convergence and better accuracy in training deep neural networks.

Findings

01

SRSGD improves convergence speed over standard SGD.

02

SRSGD achieves lower error rates on ImageNet and CIFAR datasets.

03

SRSGD requires fewer epochs to reach comparable or better accuracy.

Abstract

Stochastic gradient descent (SGD) with constant momentum and its variants such as Adam are the optimization algorithms of choice for training deep neural networks (DNNs). Since DNN training is incredibly computationally expensive, there is great interest in speeding up the convergence. Nesterov accelerated gradient (NAG) improves the convergence rate of gradient descent (GD) for convex optimization using a specially designed momentum; however, it accumulates error when an inexact gradient is used (such as in SGD), slowing convergence at best and diverging at worst. In this paper, we propose Scheduled Restart SGD (SRSGD), a new NAG-style scheme for training DNNs. SRSGD replaces the constant momentum in SGD by the increasing momentum in NAG but stabilizes the iterations by resetting the momentum to zero according to a schedule. Using a variety of models and benchmarks for image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

minhtannguyen/SRSGD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning

MethodsNesterov Accelerated Gradient · Adam · Stochastic Gradient Descent