Backtracking gradient descent method for general $C^1$ functions, with   applications to Deep Learning

Tuyen Trung Truong; Tuan Hang Nguyen

arXiv:1808.05160·math.OC·March 2, 2021

Backtracking gradient descent method for general $C^1$ functions, with applications to Deep Learning

Tuyen Trung Truong, Tuan Hang Nguyen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a backtracking gradient descent method applicable to all $C^1$ functions, including deep neural networks, proving convergence properties and demonstrating superior empirical performance over existing optimizers.

Contribution

It provides a theoretical convergence analysis for backtracking gradient descent on general $C^1$ functions and introduces modifications that outperform current state-of-the-art optimizers in deep learning.

Findings

01

Backtracking GD converges for all Morse functions.

02

The new algorithms outperform Adam, RMSProp, and others on CIFAR datasets.

03

The method automatically tunes learning rates effectively.

Abstract

While Standard gradient descent is one very popular optimisation method, its convergence cannot be proven beyond the class of functions whose gradient is globally Lipschitz continuous. As such, it is not actually applicable to realistic applications such as Deep Neural Networks. In this paper, we prove that its backtracking variant behaves very nicely, in particular convergence can be shown for all Morse functions. The main theoretical result of this paper is as follows. Theorem. Let $f : R^{k} \to R$ be a $C^{1}$ function, and ${z_{n}}$ a sequence constructed from the Backtracking gradient descent algorithm. (1) Either $lim_{n \to \infty} ∣∣ z_{n} ∣∣ = \infty$ or $lim_{n \to \infty} ∣∣ z_{n + 1} - z_{n} ∣∣ = 0$ . (2) Assume that $f$ has at most countably many critical points. Then either $lim_{n \to \infty} ∣∣ z_{n} ∣∣ = \infty$ or ${z_{n}}$ converges to a critical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hank-nguyen/MBT-optimizer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Numerical methods in inverse problems

MethodsRMSProp · Adam