Derivatives of Stochastic Gradient Descent in parametric optimization

Franck Iutzeler; Edouard Pauwels; Samuel Vaiter

arXiv:2405.15894·math.OC·November 21, 2024

Derivatives of Stochastic Gradient Descent in parametric optimization

Franck Iutzeler, Edouard Pauwels, Samuel Vaiter

PDF

Open Access

TL;DR

This paper analyzes how derivatives of stochastic gradient descent (SGD) with respect to parameters behave, showing convergence properties and stability under different step-size regimes, with theoretical proofs and numerical validation.

Contribution

It introduces a novel analysis of SGD derivatives, revealing their convergence behavior and stability, and connects these derivatives to the solution mapping in parametric stochastic optimization.

Findings

01

Derivatives of SGD converge to the true solution derivatives in strongly convex settings.

02

With constant step-sizes, derivatives stabilize within a noise ball.

03

With vanishing step-sizes, derivatives converge at an $O(rac{ ext{log}(k)^2}{k})$ rate.

Abstract

We consider stochastic optimization problems where the objective depends on some parameter, as commonly found in hyperparameter optimization for instance. We investigate the behavior of the derivatives of the iterates of Stochastic Gradient Descent (SGD) with respect to that parameter and show that they are driven by an inexact SGD recursion on a different objective function, perturbed by the convergence of the original SGD. This enables us to establish that the derivatives of SGD converge to the derivative of the solution mapping in terms of mean squared error whenever the objective is strongly convex. Specifically, we demonstrate that with constant step-sizes, these derivatives stabilize within a noise ball centered at the solution derivative, and that with vanishing step-sizes they exhibit $O (lo g (k)^{2} / k)$ convergence rates. Additionally, we prove exponential convergence in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRandom Matrices and Applications · Point processes and geometric inequalities · Stochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent