SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov; Frank Hutter

arXiv:1608.03983·cs.LG·May 4, 2017·1.7k cites

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov, Frank Hutter

PDF

Open Access 5 Repos 3 Models

TL;DR

This paper introduces SGDR, a warm restart technique for stochastic gradient descent that enhances training efficiency and achieves state-of-the-art results on multiple datasets, including CIFAR and ImageNet.

Contribution

It proposes a simple warm restart method for SGD that improves convergence and performance in training deep neural networks.

Findings

01

Achieved state-of-the-art accuracy on CIFAR-10 and CIFAR-100 datasets.

02

Demonstrated improved training performance on EEG data and a subset of ImageNet.

03

Showed that warm restarts can significantly enhance deep learning training efficiency.

Abstract

Restart techniques are common in gradient-free optimization to deal with multimodal functions. Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions. In this paper, we propose a simple warm restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks. We empirically study its performance on the CIFAR-10 and CIFAR-100 datasets, where we demonstrate new state-of-the-art results at 3.14% and 16.21%, respectively. We also demonstrate its advantages on a dataset of EEG recordings and on a downsampled version of the ImageNet dataset. Our source code is available at https://github.com/loshchil/SGDR

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Sparse and Compressive Sensing Techniques

MethodsCosine Annealing