Gradient descent revisited via an adaptive online learning rate

Mathieu Ravaut; Satya Gorti

arXiv:1801.09136·stat.ML·April 10, 2018·5 cites

Gradient descent revisited via an adaptive online learning rate

Mathieu Ravaut, Satya Gorti

PDF

Open Access 1 Repo

TL;DR

This paper introduces an adaptive gradient descent method that learns the optimal learning rate during training, eliminating the need for manual tuning and potentially improving convergence for deep models.

Contribution

It proposes a novel variation of gradient descent where the learning rate is learned adaptively using either first- or second-order methods.

Findings

01

Adaptive learning rate improves convergence efficiency.

02

Method reduces manual tuning effort.

03

Applicable to various machine learning algorithms.

Abstract

Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the gradient descent algorithm in the which the learning rate is not fixed. Instead, we learn the learning rate itself, either by another gradient descent (first-order method), or by Newton's method (second-order). This way, gradient descent for any machine learning algorithm can be optimized.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UrosOgrizovic/SimpleGoogleQuickdraw
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Sparse and Compressive Sensing Techniques