Gradient descent revisited via an adaptive online learning rate
Mathieu Ravaut, Satya Gorti

TL;DR
This paper introduces an adaptive gradient descent method that learns the optimal learning rate during training, eliminating the need for manual tuning and potentially improving convergence for deep models.
Contribution
It proposes a novel variation of gradient descent where the learning rate is learned adaptively using either first- or second-order methods.
Findings
Adaptive learning rate improves convergence efficiency.
Method reduces manual tuning effort.
Applicable to various machine learning algorithms.
Abstract
Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the gradient descent algorithm in the which the learning rate is not fixed. Instead, we learn the learning rate itself, either by another gradient descent (first-order method), or by Newton's method (second-order). This way, gradient descent for any machine learning algorithm can be optimized.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Sparse and Compressive Sensing Techniques
