Convergence analysis of stochastic gradient descent with adaptive preconditioning for non-convex and convex functions
Dmitrii A. Pasechnyuk, Alexander Gasnikov, Martin Tak\'a\v{c}

TL;DR
This paper provides a theoretical framework for understanding how stochastic gradient descent with adaptive preconditioning converges on both convex and non-convex functions, improving the dependency on the Lipschitz constant.
Contribution
It introduces a simple theoretical analysis showing convergence guarantees for stochastic gradient methods with adaptive preconditioning, applicable to non-convex and convex functions.
Findings
Improved convergence rate dependency on the Lipschitz constant.
Applicable to stochastic gradient methods with unbiased gradient estimates.
Theoretical justification for preconditioning techniques in optimization.
Abstract
Preconditioning is a crucial operation in gradient-based numerical optimisation. It helps decrease the local condition number of a function by appropriately transforming its gradient. For a convex function, where the gradient can be computed exactly, the optimal linear transformation corresponds to the inverse of the Hessian operator, while the optimal convex transformation is the convex conjugate of the function. Different conditions result in variations of these dependencies. Practical algorithms often employ low-rank or stochastic approximations of the inverse Hessian matrix for preconditioning. However, theoretical guarantees for these algorithms typically lack a justification for the defining property of preconditioning. This paper presents a simple theoretical framework that demonstrates, given a smooth function and an available unbiased stochastic approximation of its gradient,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Numerical methods in inverse problems
