Incorporating the Barzilai-Borwein Adaptive Step Size into Sugradient   Methods for Deep Network Training

Antonio Robles-Kelly; Asef Nazari

arXiv:2205.13711·cs.LG·May 30, 2022

Incorporating the Barzilai-Borwein Adaptive Step Size into Sugradient Methods for Deep Network Training

Antonio Robles-Kelly, Asef Nazari

PDF

TL;DR

This paper introduces a novel adaptive learning rate method for deep network training by integrating the Barzilai-Borwein step size into gradient descent algorithms, leading to smoother and faster convergence.

Contribution

It presents a general approach to incorporate Barzilai-Borwein step size into popular gradient descent methods like Adagrad and RMSprop for improved training efficiency.

Findings

01

Faster convergence compared to standard methods

02

Smoother training dynamics observed

03

Comparable or better final performance

Abstract

In this paper, we incorporate the Barzilai-Borwein step size into gradient descent methods used to train deep networks. This allows us to adapt the learning rate using a two-point approximation to the secant equation which quasi-Newton methods are based upon. Moreover, the adaptive learning rate method presented here is quite general in nature and can be applied to widely used gradient descent approaches such as Adagrad and RMSprop. We evaluate our method using standard example network architectures on widely available datasets and compare against alternatives elsewhere in the literature. In our experiments, our adaptive learning rate shows a smoother and faster convergence than that exhibited by the alternatives, with better or comparable performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAdaGrad