L4: Practical loss-based stepsize adaptation for deep learning

Michal Rolinek; and Georg Martius

arXiv:1802.05074·cs.LG·December 3, 2018·27 cites

L4: Practical loss-based stepsize adaptation for deep learning

Michal Rolinek, and Georg Martius

PDF

Open Access 2 Repos

TL;DR

This paper introduces a practical stepsize adaptation method for stochastic gradient descent that improves optimizer performance across various neural network architectures and datasets without additional computational cost.

Contribution

It presents a novel loss-based stepsize adaptation scheme that enhances existing optimizers like Adam and Momentum with minimal overhead.

Findings

01

Enhanced optimizers outperform fixed stepsize versions.

02

Method works across multiple architectures and datasets.

03

No additional computational cost incurred.

Abstract

We propose a stepsize adaptation scheme for stochastic gradient descent. It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss. We demonstrate its capabilities by conclusively improving the performance of Adam and Momentum optimizers. The enhanced optimizers with default hyperparameters consistently outperform their constant stepsize counterparts, even the best ones, without a measurable increase in computational cost. The performance is validated on multiple architectures including dense nets, CNNs, ResNets, and the recurrent Differential Neural Computer on classical datasets MNIST, fashion MNIST, CIFAR10 and others.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Neural Networks and Applications

MethodsAdam