SGDR: Stochastic Gradient Descent with Warm Restarts
Ilya Loshchilov, Frank Hutter

TL;DR
This paper introduces SGDR, a warm restart technique for stochastic gradient descent that enhances training efficiency and achieves state-of-the-art results on multiple datasets, including CIFAR and ImageNet.
Contribution
It proposes a simple warm restart method for SGD that improves convergence and performance in training deep neural networks.
Findings
Achieved state-of-the-art accuracy on CIFAR-10 and CIFAR-100 datasets.
Demonstrated improved training performance on EEG data and a subset of ImageNet.
Showed that warm restarts can significantly enhance deep learning training efficiency.
Abstract
Restart techniques are common in gradient-free optimization to deal with multimodal functions. Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions. In this paper, we propose a simple warm restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks. We empirically study its performance on the CIFAR-10 and CIFAR-100 datasets, where we demonstrate new state-of-the-art results at 3.14% and 16.21%, respectively. We also demonstrate its advantages on a dataset of EEG recordings and on a downsampled version of the ImageNet dataset. Our source code is available at https://github.com/loshchil/SGDR
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Sparse and Compressive Sensing Techniques
MethodsCosine Annealing
