Stochastic gradient with least-squares control variates

Fabio Nobile; Matteo Raviola; Nathan Schaeffer

arXiv:2507.20981·math.OC·November 21, 2025

Stochastic gradient with least-squares control variates

Fabio Nobile, Matteo Raviola, Nathan Schaeffer

PDF

TL;DR

This paper introduces a new variance reduction technique for stochastic gradient descent that uses least-squares control variates, improving convergence in expectation-based problems without relying on finite-sum structures.

Contribution

It presents a novel least-squares control variate method for SGD that is effective for expectation-based objectives, extending variance reduction beyond finite-sum settings.

Findings

01

The method achieves sublinear convergence guarantees for strongly convex problems.

02

Numerical experiments show improved convergence on PDE-constrained optimization tasks.

03

The approach maintains computational efficiency comparable to standard SGD.

Abstract

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by leveraging stored gradient information; however, they are restricted to settings where the objective functional is a finite sum, and their performance degrades when the number of terms in the sum is large. In this work, we propose a novel approach which is well suited when the objective is given by an expectation over random variables with a continuous probability distribution. Our method constructs a control variate by fitting a linear model to past gradient evaluations using weighted discrete least-squares, effectively reducing variance while preserving computational efficiency. We establish theoretical sublinear convergence guarantees for strongly convex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.