AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms
Rustem Islamov, Mher Safaryan, Dan Alistarh

TL;DR
This paper provides a comprehensive convergence analysis of asynchronous SGD algorithms in heterogeneous distributed settings, introducing a new worker shuffling method and demonstrating theoretical and practical improvements.
Contribution
It offers a unified convergence theory for asynchronous SGD in heterogeneous environments and introduces a novel worker shuffling technique.
Findings
Convergence guarantees for pure asynchronous SGD and modifications.
Theoretical rates match the best-known results for related algorithms.
Numerical results confirm practical effectiveness of the proposed methods.
Abstract
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting, where each worker has its own computation and communication speeds, as well as data distribution. In these algorithms, workers compute possibly stale and stochastic gradients associated with their local data at some iteration back in history and then return those gradients to the server without synchronizing with other workers. We present a unified convergence theory for non-convex smooth functions in the heterogeneous regime. The proposed analysis provides convergence for pure asynchronous SGD and its various modifications. Moreover, our theory explains what affects the convergence rate and what can be done to improve the performance of asynchronous algorithms. In particular, we introduce a novel asynchronous method based on worker shuffling. As a by-product of our analysis, we also demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
