Incorporating the Barzilai-Borwein Adaptive Step Size into Sugradient Methods for Deep Network Training
Antonio Robles-Kelly, Asef Nazari

TL;DR
This paper introduces a novel adaptive learning rate method for deep network training by integrating the Barzilai-Borwein step size into gradient descent algorithms, leading to smoother and faster convergence.
Contribution
It presents a general approach to incorporate Barzilai-Borwein step size into popular gradient descent methods like Adagrad and RMSprop for improved training efficiency.
Findings
Faster convergence compared to standard methods
Smoother training dynamics observed
Comparable or better final performance
Abstract
In this paper, we incorporate the Barzilai-Borwein step size into gradient descent methods used to train deep networks. This allows us to adapt the learning rate using a two-point approximation to the secant equation which quasi-Newton methods are based upon. Moreover, the adaptive learning rate method presented here is quite general in nature and can be applied to widely used gradient descent approaches such as Adagrad and RMSprop. We evaluate our method using standard example network architectures on widely available datasets and compare against alternatives elsewhere in the literature. In our experiments, our adaptive learning rate shows a smoother and faster convergence than that exhibited by the alternatives, with better or comparable performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAdaGrad
