A Stochastic Variance Reduced Nesterov's Accelerated Quasi-Newton Method

Sota Yasuda; Shahrzad Mahboubi; S. Indrapriyadarsini; Hiroshi Ninomiya; and Hideki Asai

arXiv:1910.07939·cs.LG·October 16, 2020

A Stochastic Variance Reduced Nesterov's Accelerated Quasi-Newton Method

Sota Yasuda, Shahrzad Mahboubi, S. Indrapriyadarsini, Hiroshi Ninomiya, and Hideki Asai

PDF

TL;DR

This paper introduces a stochastic variance reduced Nesterov's Accelerated Quasi-Newton method to improve training efficiency for large-scale neural network problems, demonstrating superior performance over existing methods.

Contribution

The paper proposes the SVR-NAQ and SVRLNAQ algorithms, incorporating variance reduction into Nesterov's accelerated quasi-Newton methods for the first time.

Findings

01

Improved convergence speed over traditional methods

02

Effective in both regression and classification benchmarks

03

Reduced stochastic noise in large-scale training

Abstract

Recently algorithms incorporating second order curvature information have become popular in training neural networks. The Nesterov's Accelerated Quasi-Newton (NAQ) method has shown to effectively accelerate the BFGS quasi-Newton method by incorporating the momentum term and Nesterov's accelerated gradient vector. A stochastic version of NAQ method was proposed for training of large-scale problems. However, this method incurs high stochastic variance noise. This paper proposes a stochastic variance reduced Nesterov's Accelerated Quasi-Newton method in full (SVR-NAQ) and limited (SVRLNAQ) memory forms. The performance of the proposed method is evaluated in Tensorflow on four benchmark problems - two regression and two classification problems respectively. The results show improved performance compared to conventional methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.