Mini-Batch Covariance, Diffusion Limits, and Oracle Complexity in Stochastic Gradient Descent: A Sampling-Design Perspective
Daniel Zantedeschi, Kumar Muthuraman

TL;DR
This paper analyzes mini-batch SGD noise as a sampling design, deriving diffusion limits and oracle complexity bounds with theoretical and empirical validation.
Contribution
It introduces a sampling-design perspective on mini-batch covariance, providing new diffusion limit results and oracle complexity guarantees for SGD.
Findings
Conditional covariance of mini-batch noise is characterized as b^{-1} G_mu(theta).
Diffusion limits of scaled fluctuations follow a CLT with covariance G*.
Oracle complexity bounds depend on effective dimension and condition number.
Abstract
Stochastic gradient descent (SGD) is central to simulation optimization, stochastic programming, and online M-estimation, where sampling effort is a decision variable. We study the mini-batch gradient noise as a sampling-design object. Under exchangeable fresh-sampling mini-batches, the conditional covariance given the de Finetti directing measure mu is b^{-1} G_mu(theta), and under identifiability the projected population object is b^{-1} G*(theta) -- projected Fisher information for correctly specified likelihoods, the sandwich partner of the Hessian otherwise. This identification fixes the noise matrix entering the diffusion analysis of constant-step SGD: the raw iterate path has a deterministic fluid limit, and the sqrt(b/eta)-scaled fluctuations satisfy a functional CLT with noise covariance G*; near a nondegenerate optimum the limit is Ornstein-Uhlenbeck, and its Lyapunov…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
