L4: Practical loss-based stepsize adaptation for deep learning
Michal Rolinek, and Georg Martius

TL;DR
This paper introduces a practical stepsize adaptation method for stochastic gradient descent that improves optimizer performance across various neural network architectures and datasets without additional computational cost.
Contribution
It presents a novel loss-based stepsize adaptation scheme that enhances existing optimizers like Adam and Momentum with minimal overhead.
Findings
Enhanced optimizers outperform fixed stepsize versions.
Method works across multiple architectures and datasets.
No additional computational cost incurred.
Abstract
We propose a stepsize adaptation scheme for stochastic gradient descent. It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss. We demonstrate its capabilities by conclusively improving the performance of Adam and Momentum optimizers. The enhanced optimizers with default hyperparameters consistently outperform their constant stepsize counterparts, even the best ones, without a measurable increase in computational cost. The performance is validated on multiple architectures including dense nets, CNNs, ResNets, and the recurrent Differential Neural Computer on classical datasets MNIST, fashion MNIST, CIFAR10 and others.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Neural Networks and Applications
MethodsAdam
