Demystifying SGD with Doubly Stochastic Gradients
Kyurae Kim, Joohwan Ko, Yi-An Ma, Jacob R. Gardner

TL;DR
This paper analyzes the convergence of doubly stochastic gradient descent methods for complex optimization problems involving expectations, providing new insights into their theoretical properties and practical efficiency.
Contribution
It establishes convergence results for doubly SGD under general conditions, including dependent estimators, and analyzes how to optimally allocate computational resources.
Findings
Convergence of doubly SGD is proven under broad conditions.
Random reshuffling improves complexity dependence.
Guidance on resource allocation for minibatch and Monte Carlo samples.
Abstract
Optimization objectives in the form of a sum of intractable expectations are rising in importance (e.g., diffusion models, variational autoencoders, and many more), a setting also known as "finite sum with infinite data." For these problems, a popular strategy is to employ SGD with doubly stochastic gradients (doubly SGD): the expectations are estimated using the gradient estimator of each component, while the sum is estimated by subsampling over these estimators. Despite its popularity, little is known about the convergence properties of doubly SGD, except under strong assumptions such as bounded variance. In this work, we establish the convergence of doubly SGD with independent minibatching and random reshuffling under general conditions, which encompasses dependent component gradient estimators. In particular, for dependent estimators, our analysis allows fined-grained analysis of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications · MRI in cancer diagnosis
MethodsStochastic Gradient Descent · Diffusion
