Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

Ammar Mahran; Artavazd Maranjyan; Peter Richt\'arik

arXiv:2605.13434·cs.LG·May 14, 2026

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

Ammar Mahran, Artavazd Maranjyan, Peter Richt\'arik

PDF

TL;DR

Rescaled Asynchronous SGD introduces a simple yet effective adjustment to standard ASGD by rescaling worker stepsizes based on computation times, ensuring convergence to the true global objective despite heterogeneity.

Contribution

It demonstrates that rescaling worker-specific stepsizes in ASGD corrects bias caused by heterogeneity without additional communication or memory overhead.

Findings

01

Rescaled ASGD converges to the correct global objective in heterogeneous settings.

02

The method's time complexity matches the theoretical lower bounds.

03

Experiments show competitive convergence and accuracy.

Abstract

Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server updates the model whenever a gradient arrives. Vanilla ASGD applies each arriving gradient with the same weight. When local data distributions are heterogeneous, this becomes problematic: faster workers contribute more updates, and we show theoretically that the method is biased toward a frequency-weighted average of the local objectives rather than the desired global objective. Existing remedies typically move away from the simple ASGD template by introducing gathering phases, buffering, or extra memory. We show that this is unnecessary. Keeping the standard ASGD mechanism, we recover the correct objective by rescaling worker-specific stepsizes in proportion to their computation times, so…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.