Convergence analysis of stochastic gradient descent with adaptive   preconditioning for non-convex and convex functions

Dmitrii A. Pasechnyuk; Alexander Gasnikov; Martin Tak\'a\v{c}

arXiv:2308.14192·math.OC·August 29, 2023

Convergence analysis of stochastic gradient descent with adaptive preconditioning for non-convex and convex functions

Dmitrii A. Pasechnyuk, Alexander Gasnikov, Martin Tak\'a\v{c}

PDF

Open Access

TL;DR

This paper provides a theoretical framework for understanding how stochastic gradient descent with adaptive preconditioning converges on both convex and non-convex functions, improving the dependency on the Lipschitz constant.

Contribution

It introduces a simple theoretical analysis showing convergence guarantees for stochastic gradient methods with adaptive preconditioning, applicable to non-convex and convex functions.

Findings

01

Improved convergence rate dependency on the Lipschitz constant.

02

Applicable to stochastic gradient methods with unbiased gradient estimates.

03

Theoretical justification for preconditioning techniques in optimization.

Abstract

Preconditioning is a crucial operation in gradient-based numerical optimisation. It helps decrease the local condition number of a function by appropriately transforming its gradient. For a convex function, where the gradient can be computed exactly, the optimal linear transformation corresponds to the inverse of the Hessian operator, while the optimal convex transformation is the convex conjugate of the function. Different conditions result in variations of these dependencies. Practical algorithms often employ low-rank or stochastic approximations of the inverse Hessian matrix for preconditioning. However, theoretical guarantees for these algorithms typically lack a justification for the defining property of preconditioning. This paper presents a simple theoretical framework that demonstrates, given a smooth function and an available unbiased stochastic approximation of its gradient,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Numerical methods in inverse problems