Demystifying SGD with Doubly Stochastic Gradients

Kyurae Kim; Joohwan Ko; Yi-An Ma; Jacob R. Gardner

arXiv:2406.00920·stat.ML·May 13, 2025

Demystifying SGD with Doubly Stochastic Gradients

Kyurae Kim, Joohwan Ko, Yi-An Ma, Jacob R. Gardner

PDF

Open Access

TL;DR

This paper analyzes the convergence of doubly stochastic gradient descent methods for complex optimization problems involving expectations, providing new insights into their theoretical properties and practical efficiency.

Contribution

It establishes convergence results for doubly SGD under general conditions, including dependent estimators, and analyzes how to optimally allocate computational resources.

Findings

01

Convergence of doubly SGD is proven under broad conditions.

02

Random reshuffling improves complexity dependence.

03

Guidance on resource allocation for minibatch and Monte Carlo samples.

Abstract

Optimization objectives in the form of a sum of intractable expectations are rising in importance (e.g., diffusion models, variational autoencoders, and many more), a setting also known as "finite sum with infinite data." For these problems, a popular strategy is to employ SGD with doubly stochastic gradients (doubly SGD): the expectations are estimated using the gradient estimator of each component, while the sum is estimated by subsampling over these estimators. Despite its popularity, little is known about the convergence properties of doubly SGD, except under strong assumptions such as bounded variance. In this work, we establish the convergence of doubly SGD with independent minibatching and random reshuffling under general conditions, which encompasses dependent component gradient estimators. In particular, for dependent estimators, our analysis allows fined-grained analysis of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and financial applications · MRI in cancer diagnosis

MethodsStochastic Gradient Descent · Diffusion