Derivatives of Stochastic Gradient Descent in parametric optimization
Franck Iutzeler, Edouard Pauwels, Samuel Vaiter

TL;DR
This paper analyzes how derivatives of stochastic gradient descent (SGD) with respect to parameters behave, showing convergence properties and stability under different step-size regimes, with theoretical proofs and numerical validation.
Contribution
It introduces a novel analysis of SGD derivatives, revealing their convergence behavior and stability, and connects these derivatives to the solution mapping in parametric stochastic optimization.
Findings
Derivatives of SGD converge to the true solution derivatives in strongly convex settings.
With constant step-sizes, derivatives stabilize within a noise ball.
With vanishing step-sizes, derivatives converge at an $O(rac{ ext{log}(k)^2}{k})$ rate.
Abstract
We consider stochastic optimization problems where the objective depends on some parameter, as commonly found in hyperparameter optimization for instance. We investigate the behavior of the derivatives of the iterates of Stochastic Gradient Descent (SGD) with respect to that parameter and show that they are driven by an inexact SGD recursion on a different objective function, perturbed by the convergence of the original SGD. This enables us to establish that the derivatives of SGD converge to the derivative of the solution mapping in terms of mean squared error whenever the objective is strongly convex. Specifically, we demonstrate that with constant step-sizes, these derivatives stabilize within a noise ball centered at the solution derivative, and that with vanishing step-sizes they exhibit convergence rates. Additionally, we prove exponential convergence in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Point processes and geometric inequalities · Stochastic Gradient Optimization Techniques
MethodsStochastic Gradient Descent
