Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles
Yifan Hu, Jie Wang, Xin Chen, Niao He

TL;DR
This paper introduces multi-level Monte Carlo gradient methods for stochastic optimization with biased oracles, achieving lower complexity and better performance than traditional biased stochastic gradient methods across various problem types.
Contribution
The paper develops and analyzes MLMC gradient methods that effectively balance bias, variance, and cost, demonstrating their advantages over existing biased stochastic gradient approaches.
Findings
MLMC methods outperform standard biased stochastic gradient methods.
Combining MLMC with variance reduction techniques further reduces complexity.
Numerical experiments confirm superior performance in practical applications.
Abstract
We consider stochastic optimization when one only has access to biased stochastic oracles of the objective and the gradient, and obtaining stochastic gradients with low biases comes at high costs. This setting captures various optimization paradigms, such as conditional stochastic optimization, distributionally robust optimization, shortfall risk optimization, and machine learning paradigms, such as contrastive learning. We examine a family of multi-level Monte Carlo (MLMC) gradient methods that exploit a delicate tradeoff among bias, variance, and oracle cost. We systematically study their total sample and computational complexities for strongly convex, convex, and nonconvex objectives and demonstrate their superiority over the widely used biased stochastic gradient method. When combined with the variance reduction techniques like SPIDER, these MLMC gradient methods can further reduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques
MethodsContrastive Learning
