Backtracking gradient descent method for general $C^1$ functions, with applications to Deep Learning
Tuyen Trung Truong, Tuan Hang Nguyen

TL;DR
This paper introduces a backtracking gradient descent method applicable to all $C^1$ functions, including deep neural networks, proving convergence properties and demonstrating superior empirical performance over existing optimizers.
Contribution
It provides a theoretical convergence analysis for backtracking gradient descent on general $C^1$ functions and introduces modifications that outperform current state-of-the-art optimizers in deep learning.
Findings
Backtracking GD converges for all Morse functions.
The new algorithms outperform Adam, RMSProp, and others on CIFAR datasets.
The method automatically tunes learning rates effectively.
Abstract
While Standard gradient descent is one very popular optimisation method, its convergence cannot be proven beyond the class of functions whose gradient is globally Lipschitz continuous. As such, it is not actually applicable to realistic applications such as Deep Neural Networks. In this paper, we prove that its backtracking variant behaves very nicely, in particular convergence can be shown for all Morse functions. The main theoretical result of this paper is as follows. Theorem. Let be a function, and a sequence constructed from the Backtracking gradient descent algorithm. (1) Either or . (2) Assume that has at most countably many critical points. Then either or converges to a critical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Numerical methods in inverse problems
MethodsRMSProp · Adam
