AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly
Yuchen Jin, Tianyi Zhou, Liangyu Zhao, Yibo Zhu, Chuanxiong Guo, Marco, Canini, Arvind Krishnamurthy

TL;DR
AutoLRS automatically tunes learning rate schedules during neural network training using Bayesian optimization, significantly reducing manual effort and improving training speed across diverse models and tasks.
Contribution
We introduce AutoLRS, a novel method that dynamically and automatically optimizes learning rates during training with minimal manual tuning, leveraging Bayesian optimization and loss prediction.
Findings
Achieves up to 1.5x speedup in training ResNet-50, Transformer, and BERT.
Outperforms manually tuned learning rate schedules and state-of-the-art methods.
Demonstrates generality across different models and datasets.
Abstract
The learning rate (LR) schedule is one of the most important hyper-parameters needing careful tuning in training DNNs. However, it is also one of the least automated parts of machine learning systems and usually costs significant manual effort and computing. Though there are pre-defined LR schedules and optimizers with adaptive LR, they introduce new hyperparameters that need to be tuned separately for different tasks/datasets. In this paper, we consider the question: Can we automatically tune the LR over the course of training without human involvement? We propose an efficient method, AutoLRS, which automatically optimizes the LR for each training stage by modeling training dynamics. AutoLRS aims to find an LR applied to every steps that minimizes the resulted validation loss. We solve this black-box optimization on the fly by Bayesian optimization (BO). However, collecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Gaussian Processes and Bayesian Inference
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Warmup With Linear Decay · WordPiece · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection
