Toward Understanding the Impact of Staleness in Distributed Machine Learning
Wei Dai, Yi Zhou, Nanqing Dong, Hao Zhang, Eric P. Xing

TL;DR
This paper investigates how stale parameters in distributed machine learning systems affect convergence, providing empirical insights and a new theoretical analysis that aligns with existing convergence rate benchmarks.
Contribution
It offers a comprehensive empirical study of staleness effects and introduces a convergence analysis for stochastic gradient descent under staleness in non-convex optimization.
Findings
Staleness has diverse effects on ML convergence.
Empirical results clarify contradictory reports in literature.
New convergence analysis matches O(1/√T) rate for SGD with staleness.
Abstract
Many distributed machine learning (ML) systems adopt the non-synchronous execution in order to alleviate the network communication bottleneck, resulting in stale parameters that do not reflect the latest updates. Despite much development in large-scale ML, the effects of staleness on learning are inconclusive as it is challenging to directly monitor or control staleness in complex distributed environments. In this work, we study the convergence behaviors of a wide array of ML models and algorithms under delayed updates. Our extensive experiments reveal the rich diversity of the effects of staleness on the convergence of ML algorithms and offer insights into seemingly contradictory reports in the literature. The empirical findings also inspire a new convergence analysis of stochastic gradient descent in non-convex optimization under staleness, matching the best-known convergence rate of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Age of Information Optimization · Privacy-Preserving Technologies in Data
