Asymptotic Analysis of Conditioned Stochastic Gradient Descent
R\'emi Leluc, Fran\c{c}ois Portier

TL;DR
This paper provides an asymptotic analysis of Conditioned SGD algorithms, demonstrating their convergence properties and optimality when using inverse Hessian estimates, using martingale techniques in a discrete-time framework.
Contribution
It introduces a general framework for analyzing Conditioned SGD, establishing weak and almost sure convergence, and highlights asymptotic optimality with inverse Hessian conditioning.
Findings
Weak convergence of rescaled iterates established
Almost sure convergence results derived
Asymptotic normality linked to stochastic equicontinuity
Abstract
In this paper, we investigate a general class of stochastic gradient descent (SGD) algorithms, called Conditioned SGD, based on a preconditioning of the gradient direction. Using a discrete-time approach with martingale tools, we establish under mild assumptions the weak convergence of the rescaled sequence of iterates for a broad class of conditioning matrices including stochastic first-order and second-order methods. Almost sure convergence results, which may be of independent interest, are also presented. Interestingly, the asymptotic normality result consists in a stochastic equicontinuity property so when the conditioning matrix is an estimate of the inverse Hessian, the algorithm is asymptotically optimal.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Random Matrices and Applications · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
