Gradient-only line searches to automatically determine learning rates for a variety of stochastic training algorithms
Dominic Kafka, Daniel Nicolas Wilke

TL;DR
This paper demonstrates that Gradient-Only Line Search Inexact (GOLS-I) can automatically determine effective learning rate schedules across various neural network architectures and training algorithms, reducing the need for manual hyperparameter tuning.
Contribution
The study applies GOLS-I to multiple neural network training algorithms and architectures, showing its effectiveness in automatically setting learning rates over a wide range.
Findings
GOLS-I's learning rates are competitive with manual tuning.
Algorithms with dominant momentum are less compatible with GOLS-I.
GOLS-I works effectively over 15 orders of magnitude for most algorithms.
Abstract
Gradient-only and probabilistic line searches have recently reintroduced the ability to adaptively determine learning rates in dynamic mini-batch sub-sampled neural network training. However, stochastic line searches are still in their infancy and thus call for an ongoing investigation. We study the application of the Gradient-Only Line Search that is Inexact (GOLS-I) to automatically determine the learning rate schedule for a selection of popular neural network training algorithms, including NAG, Adagrad, Adadelta, Adam and LBFGS, with numerous shallow, deep and convolutional neural network architectures trained on different datasets with various loss functions. We find that GOLS-I's learning rate schedules are competitive with manually tuned learning rates, over seven optimization algorithms, three types of neural network architecture, 23 datasets and two loss functions. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
MethodsAdam
