SP2: A Second Order Stochastic Polyak Method
Shuang Li, William J. Swartworth, Martin Tak\'a\v{c}, Deanna Needell,, Robert M. Gower

TL;DR
SP2 introduces a second-order stochastic Polyak method that leverages Hessian-vector products to enhance convergence without requiring convexity or positive definiteness, demonstrating competitiveness across various tasks.
Contribution
It develops SP2, a novel second-order method using Hessian-vector products for faster convergence, extending the SP framework to non-convex and interpolated models.
Findings
SP2 accelerates convergence compared to first-order methods.
Effective on matrix completion and non-convex problems.
No need for positive definite Hessians or convexity.
Abstract
Recently the "SP" (Stochastic Polyak step size) method has emerged as a competitive adaptive method for setting the step sizes of SGD. SP can be interpreted as a method specialized to interpolated models, since it solves the interpolation equations. SP solves these equation by using local linearizations of the model. We take a step further and develop a method for solving the interpolation equations that uses the local second-order approximation of the model. Our resulting method SP2 uses Hessian-vector products to speed-up the convergence of SP. Furthermore, and rather uniquely among second-order methods, the design of SP2 in no way relies on positive definite Hessian matrices or convexity of the objective function. We show SP2 is very competitive on matrix completion, non-convex test problems and logistic regression. We also provide a convergence theory on sums-of-quadratics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Gaussian Processes and Bayesian Inference
MethodsTest · Stochastic Gradient Descent
