Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction
Zhirayr Tovmasyan, Artavazd Maranjyan, Peter Richt\'arik

TL;DR
This paper introduces Rennala MVR, a variance-reduced extension of Rennala SGD, demonstrating improved time complexity for parallel stochastic optimization in heterogeneous systems through theoretical analysis and empirical validation.
Contribution
It proposes Rennala MVR, the first variance reduction method tailored for time complexity in heterogeneous parallel systems, with theoretical bounds and practical experiments.
Findings
Variance reduction improves time complexity under mean-squared smoothness.
Rennala MVR outperforms Rennala SGD in both theory and practice.
Empirical results on neural networks show significant gains over existing methods.
Abstract
Large-scale machine learning models are trained on clusters of machines that exhibit heterogeneous performance due to hardware variability, network delays, and system-level instabilities. In such environments, time complexity rather than iteration complexity becomes the relevant performance metric for optimization algorithms. Recent work by Tyurin and Richt\'{a}rik (2023) established the first time complexity analysis for parallel first-order stochastic optimization, proposing Rennala SGD as a time-optimal method for smooth nonconvex optimization. However, Rennala SGD is fundamentally a modification of SGD, and variance reduction techniques are known to improve the iteration complexity of SGD. In this work, we investigate whether variance reduction can also improve time complexity in heterogeneous systems. We show that, under a mean-squared smoothness assumption, variance reduction can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
