Asymptotic behaviour of learning rates in Armijo's condition

Tuyen Trung Truong; Tuan Hang Nguyen

arXiv:2007.03618·math.OC·July 8, 2020·5 cites

Asymptotic behaviour of learning rates in Armijo's condition

Tuyen Trung Truong, Tuan Hang Nguyen

PDF

Open Access

TL;DR

This paper analyzes the asymptotic behavior of learning rates in Armijo's condition within Backtracking Gradient Descent, showing boundedness near non-degenerate critical points and exploring differences at degenerate points, supported by experiments.

Contribution

It provides a theoretical characterization of learning rate bounds in Backtracking GD near critical points and clarifies the units of the learning rate in this context.

Findings

01

Learning rates are bounded near non-degenerate critical points.

02

Behavior differs significantly at degenerate critical points.

03

Backtracking GD's learning rate has a meaningful physical unit.

Abstract

Fix a constant $0 < α < 1$ . For a $C^{1}$ function $f : R^{k} \to R$ , a point $x$ and a positive number $δ > 0$ , we say that Armijo's condition is satisfied if $f (x - δ \nabla f (x)) - f (x) \leq - α δ ∣∣\nabla f (x) ∣ ∣^{2}$ . It is a basis for the well known Backtracking Gradient Descent (Backtracking GD) algorithm. Consider a sequence ${x_{n}}$ defined by $x_{n + 1} = x_{n} - δ_{n} \nabla f (x_{n})$ , for positive numbers $δ_{n}$ for which Armijo's condition is satisfied. We show that if ${x_{n}}$ converges to a non-degenerate critical point, then ${δ_{n}}$ must be bounded. Moreover this boundedness can be quantified in terms of the norms of the Hessian $\nabla^{2} f$ and its inverse at the limit point. This complements the first author's results on Unbounded Backtracking GD, and shows that in case of convergence to a non-degenerate critical point the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Model Reduction and Neural Networks