Stochastic gradient with least-squares control variates
Fabio Nobile, Matteo Raviola, Nathan Schaeffer

TL;DR
This paper introduces a new variance reduction technique for stochastic gradient descent that uses least-squares control variates, improving convergence in expectation-based problems without relying on finite-sum structures.
Contribution
It presents a novel least-squares control variate method for SGD that is effective for expectation-based objectives, extending variance reduction beyond finite-sum settings.
Findings
The method achieves sublinear convergence guarantees for strongly convex problems.
Numerical experiments show improved convergence on PDE-constrained optimization tasks.
The approach maintains computational efficiency comparable to standard SGD.
Abstract
The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by leveraging stored gradient information; however, they are restricted to settings where the objective functional is a finite sum, and their performance degrades when the number of terms in the sum is large. In this work, we propose a novel approach which is well suited when the objective is given by an expectation over random variables with a continuous probability distribution. Our method constructs a control variate by fitting a linear model to past gradient evaluations using weighted discrete least-squares, effectively reducing variance while preserving computational efficiency. We establish theoretical sublinear convergence guarantees for strongly convex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
