LRTuner: A Learning Rate Tuner for Deep Neural Networks

Nikhil Iyer; V Thejas; Nipun Kwatra; Ramachandran Ramjee; Muthian; Sivathanu

arXiv:2105.14526·cs.LG·June 1, 2021

LRTuner: A Learning Rate Tuner for Deep Neural Networks

Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian, Sivathanu

PDF

Open Access 2 Repos

TL;DR

LRTuner is an adaptive learning rate tuning method for deep neural networks that improves accuracy and reduces training time across various datasets and models by tuning the learning rate during training.

Contribution

We introduce LRTuner, a novel method for dynamically tuning learning rates during training, applicable to any optimizer, enhancing performance and efficiency over traditional schedules.

Findings

01

LRTuner improves test accuracy by up to 0.2% on ImageNet Resnet-50.

02

LRTuner reduces training steps by 29% while maintaining accuracy.

03

LRTuner outperforms standard hand-tuned learning rate schedules.

Abstract

One very important hyperparameter for training deep neural networks is the learning rate schedule of the optimizer. The choice of learning rate schedule determines the computational cost of getting close to a minima, how close you actually get to the minima, and most importantly the kind of local minima (wide/narrow) attained. The kind of minima attained has a significant impact on the generalization accuracy of the network. Current systems employ hand tuned learning rate schedules, which are painstakingly tuned for each network and dataset. Given that the state space of schedules is huge, finding a satisfactory learning rate schedule can be very time consuming. In this paper, we present LRTuner, a method for tuning the learning rate as training proceeds. Our method works with any optimizer, and we demonstrate results on SGD with Momentum, and Adam optimizers. We extensively evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Layer Normalization · Residual Connection · SGD with Momentum · WordPiece · Dropout