Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized   Learning: Part II

Jiaojiao Zhang; Huikang Liu; Anthony Man-Cho So; Qing Ling

arXiv:2201.07733·math.OC·January 20, 2022

Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part II

Jiaojiao Zhang, Huikang Liu, Anthony Man-Cho So, Qing Ling

PDF

Open Access

TL;DR

This paper introduces two fully decentralized stochastic quasi-Newton methods, DFP and BFGS, that efficiently construct Hessian inverse approximations for decentralized learning, achieving faster convergence than first-order methods.

Contribution

The paper specifies two new decentralized stochastic quasi-Newton algorithms that adaptively build Hessian inverse approximations without extra sampling or communication, ensuring linear convergence.

Findings

01

Methods outperform existing decentralized stochastic first-order algorithms.

02

Damped regularized DFP improves performance with regularization.

03

Limited-memory BFGS reduces storage and computation complexity.

Abstract

In Part I of this work, we have proposed a general framework of decentralized stochastic quasi-Newton methods, which converge linearly to the optimal solution under the assumption that the local Hessian inverse approximations have bounded positive eigenvalues. In Part II, we specify two fully decentralized stochastic quasi-Newton methods, damped regularized limited-memory DFP (Davidon-Fletcher-Powell) and damped limited-memory BFGS (Broyden-Fletcher-Goldfarb-Shanno), to locally construct such Hessian inverse approximations without extra sampling or communication. Both of the methods use a fixed moving window of $M$ past local gradient approximations and local decision variables to adaptively construct positive definite Hessian inverse approximations with bounded eigenvalues, satisfying the assumption in Part I for the linear convergence. For the proposed damped regularized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Random Matrices and Applications