Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling

Daniil Medyakov; Gleb Molodtsov; Savelii Chezhegov; Alexey Rebrikov; Aleksandr Beznosikov

arXiv:2502.14648·cs.LG·January 12, 2026

Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling

Daniil Medyakov, Gleb Molodtsov, Savelii Chezhegov, Alexey Rebrikov, Aleksandr Beznosikov

PDF

Open Access

TL;DR

This paper introduces a more efficient variance reduction method for stochastic optimization that avoids costly full gradient computations by using shuffling and SAG/SAGA techniques, improving convergence especially in non-convex and strongly convex settings.

Contribution

The paper proposes a novel variance reduction approach that eliminates the need for full gradient calculations, enhancing efficiency and scalability in large-scale machine learning tasks.

Findings

01

Convergence rates match standard shuffling methods for non-convex objectives.

02

Improved convergence under strong convexity.

03

Demonstrated scalability on CIFAR-10 and CIFAR-100 datasets.

Abstract

Stochastic optimization algorithms are widely used for machine learning with large-scale data. However, their convergence often suffers from non-vanishing variance. Variance Reduction (VR) methods, such as SVRG and SARAH, address this issue but introduce a bottleneck by requiring periodic full gradient computations. In this paper, we explore popular VR techniques and propose an approach that eliminates the necessity for expensive full gradient calculations. To avoid these computations and make our approach memory-efficient, we employ two key techniques: the shuffling heuristic and the concept of SAG/SAGA methods. For non-convex objectives, our convergence rates match those of standard shuffling methods, while under strong convexity, they demonstrate an improvement. We empirically validate the efficiency of our approach and demonstrate its scalability on large-scale machine learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks

MethodsStochastic Gradient Descent