Faster Stochastic Quasi-Newton Methods

Qingsong Zhang; Feihu Huang; Cheng Deng; and Heng Huang

arXiv:2004.06479·math.OC·February 26, 2021·IEEE Trans. Neural Networks Learn. Syst.·1 cites

Faster Stochastic Quasi-Newton Methods

Qingsong Zhang, Feihu Huang, Cheng Deng, and Heng Huang

PDF

Open Access

TL;DR

This paper introduces SpiderSQN, a faster stochastic quasi-Newton method that achieves optimal complexity bounds for nonconvex optimization and outperforms existing methods in experiments.

Contribution

The paper proposes SpiderSQN, a novel stochastic quasi-Newton algorithm with optimal complexity and enhanced practical performance through momentum schemes.

Findings

01

Achieves the best known SFO complexity of O(n + n^{1/2} ε^{-2}) in finite-sum setting.

02

Matches the best SFO complexity of O(ε^{-3}) in online setting.

03

Outperforms state-of-the-art methods in benchmark experiments.

Abstract

Stochastic optimization methods have become a class of popular optimization tools in machine learning. Especially, stochastic gradient descent (SGD) has been widely used for machine learning problems such as training neural networks due to low per-iteration computational complexity. In fact, the Newton or quasi-newton methods leveraging second-order information are able to achieve a better solution than the first-order methods. Thus, stochastic quasi-Newton (SQN) methods have been developed to achieve the better solution efficiently than the stochastic first-order methods by utilizing approximate second-order information. However, the existing SQN methods still do not reach the best known stochastic first-order oracle (SFO) complexity. To fill this gap, we propose a novel faster stochastic quasi-Newton method (SpiderSQN) based on the variance reduced technique of SIPDER. We prove that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Tensor decomposition and applications