Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction

Zhirayr Tovmasyan; Artavazd Maranjyan; Peter Richt\'arik

arXiv:2605.08871·math.OC·May 12, 2026

Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction

Zhirayr Tovmasyan, Artavazd Maranjyan, Peter Richt\'arik

PDF

TL;DR

This paper introduces Rennala MVR, a variance-reduced extension of Rennala SGD, demonstrating improved time complexity for parallel stochastic optimization in heterogeneous systems through theoretical analysis and empirical validation.

Contribution

It proposes Rennala MVR, the first variance reduction method tailored for time complexity in heterogeneous parallel systems, with theoretical bounds and practical experiments.

Findings

01

Variance reduction improves time complexity under mean-squared smoothness.

02

Rennala MVR outperforms Rennala SGD in both theory and practice.

03

Empirical results on neural networks show significant gains over existing methods.

Abstract

Large-scale machine learning models are trained on clusters of machines that exhibit heterogeneous performance due to hardware variability, network delays, and system-level instabilities. In such environments, time complexity rather than iteration complexity becomes the relevant performance metric for optimization algorithms. Recent work by Tyurin and Richt\'{a}rik (2023) established the first time complexity analysis for parallel first-order stochastic optimization, proposing Rennala SGD as a time-optimal method for smooth nonconvex optimization. However, Rennala SGD is fundamentally a modification of SGD, and variance reduction techniques are known to improve the iteration complexity of SGD. In this work, we investigate whether variance reduction can also improve time complexity in heterogeneous systems. We show that, under a mean-squared smoothness assumption, variance reduction can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.