A Stochastic Quasi-Newton Method with Nesterov's Accelerated Gradient
S. Indrapriyadarsini, Shahrzad Mahboubi, Hiroshi Ninomiya, Hideki, Asai

TL;DR
This paper introduces a stochastic quasi-Newton method enhanced with Nesterov's acceleration for large-scale non-convex optimization, demonstrating improved convergence over traditional methods in neural network training.
Contribution
It presents a novel stochastic quasi-Newton algorithm with Nesterov's acceleration, applicable in both full and limited memory forms, for neural network optimization.
Findings
Outperforms classical oBFGS and oLBFGS methods.
Achieves better results than SGD and Adam.
Effective across various momentum rates and batch sizes.
Abstract
Incorporating second order curvature information in gradient based methods have shown to improve convergence drastically despite its computational intensity. In this paper, we propose a stochastic (online) quasi-Newton method with Nesterov's accelerated gradient in both its full and limited memory forms for solving large scale non-convex optimization problems in neural networks. The performance of the proposed algorithm is evaluated in Tensorflow on benchmark classification and regression problems. The results show improved performance compared to the classical second order oBFGS and oLBFGS methods and popular first order stochastic methods such as SGD and Adam. The performance with different momentum rates and batch sizes have also been illustrated.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAdam · Stochastic Gradient Descent
