Super-Convergence: Very Fast Training of Neural Networks Using Large   Learning Rates

Leslie N. Smith; Nicholay Topin

arXiv:1708.07120·cs.LG·May 18, 2018

Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates

Leslie N. Smith, Nicholay Topin

PDF

5 Repos 2 Models

TL;DR

This paper introduces 'super-convergence', a training method using large learning rates that accelerates neural network training by an order of magnitude, improving efficiency and performance especially with limited labeled data.

Contribution

The paper presents the concept of super-convergence, demonstrating its effectiveness across multiple datasets and architectures, and simplifies Hessian Free optimization for optimal learning rate estimation.

Findings

01

Super-convergence accelerates training by an order of magnitude.

02

Large learning rates act as a regularizer, reducing the need for other regularization.

03

Super-convergence improves performance more significantly with limited labeled data.

Abstract

In this paper, we describe a phenomenon, which we named "super-convergence", where neural networks can be trained an order of magnitude faster than with standard training methods. The existence of super-convergence is relevant to understanding why deep networks generalize well. One of the key elements of super-convergence is training with one learning rate cycle and a large maximum learning rate. A primary insight that allows super-convergence training is that large learning rates regularize the training, hence requiring a reduction of all other forms of regularization in order to preserve an optimal regularization balance. We also derive a simplification of the Hessian Free optimization method to compute an estimate of the optimal learning rate. Experiments demonstrate super-convergence for Cifar-10/100, MNIST and Imagenet datasets, and resnet, wide-resnet, densenet, and inception…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.