Parallel Momentum Methods Under Biased Gradient Estimations

Ali Beikmohammadi; Sarit Khirirat; Sindri Magn\'usson

arXiv:2403.00853·cs.LG·January 14, 2025·1 cites

Parallel Momentum Methods Under Biased Gradient Estimations

Ali Beikmohammadi, Sarit Khirirat, Sindri Magn\'usson

PDF

Open Access

TL;DR

This paper analyzes the convergence of parallel momentum methods in distributed machine learning when gradients are biased, providing theoretical bounds and demonstrating improved convergence over biased gradient descent.

Contribution

It establishes worst-case convergence bounds for momentum methods with biased gradients in non-convex and $$-PL problems, covering practical scenarios like compression and clipping.

Findings

01

Momentum methods outperform biased gradient descent in experiments.

02

Theoretical bounds confirm faster convergence with biased gradients.

03

Analysis applies to meta-learning and distributed optimization.

Abstract

Parallel stochastic gradient methods are gaining prominence in solving large-scale machine learning problems that involve data distributed across multiple nodes. However, obtaining unbiased stochastic gradients, which have been the focus of most theoretical research, is challenging in many distributed machine learning applications. The gradient estimations easily become biased, for example, when gradients are compressed or clipped, when data is shuffled, and in meta-learning and reinforcement learning. In this work, we establish worst-case bounds on parallel momentum methods under biased gradient estimation on both general non-convex and $μ$ -PL problems. Our analysis covers general distributed optimization problems, and we work out the implications for special cases where gradient estimates are biased, i.e. in meta-learning and when the gradients are compressed or clipped. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGas Dynamics and Kinetic Theory

MethodsFocus