TL;DR
This paper introduces 'super-convergence', a training method using large learning rates that accelerates neural network training by an order of magnitude, improving efficiency and performance especially with limited labeled data.
Contribution
The paper presents the concept of super-convergence, demonstrating its effectiveness across multiple datasets and architectures, and simplifies Hessian Free optimization for optimal learning rate estimation.
Findings
Super-convergence accelerates training by an order of magnitude.
Large learning rates act as a regularizer, reducing the need for other regularization.
Super-convergence improves performance more significantly with limited labeled data.
Abstract
In this paper, we describe a phenomenon, which we named "super-convergence", where neural networks can be trained an order of magnitude faster than with standard training methods. The existence of super-convergence is relevant to understanding why deep networks generalize well. One of the key elements of super-convergence is training with one learning rate cycle and a large maximum learning rate. A primary insight that allows super-convergence training is that large learning rates regularize the training, hence requiring a reduction of all other forms of regularization in order to preserve an optimal regularization balance. We also derive a simplification of the Hessian Free optimization method to compute an estimate of the optimal learning rate. Experiments demonstrate super-convergence for Cifar-10/100, MNIST and Imagenet datasets, and resnet, wide-resnet, densenet, and inception…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
