Asynchronous Decentralized SGD under Non-Convexity: A Block-Coordinate Descent Framework
Yijie Zhou, Shi Pu

TL;DR
This paper proposes a new asynchronous decentralized SGD framework that is robust to delays and heterogeneity, with proven convergence and superior empirical performance in real-world distributed learning scenarios.
Contribution
It introduces a practical ADSGD model with convergence analysis under realistic assumptions, and demonstrates its effectiveness over existing methods.
Findings
ADSGD converges without bounded data heterogeneity.
Empirical results show ADSGD outperforms existing methods in wall-clock time.
The method is robust to communication and computation delays.
Abstract
Decentralized optimization has become vital for leveraging distributed data without central control, enhancing scalability and privacy. However, practical deployments face fundamental challenges due to heterogeneous computation speeds and unpredictable communication delays. This paper introduces a refined model of Asynchronous Decentralized Stochastic Gradient Descent (ADSGD) under practical assumptions of bounded computation and communication times. To understand the convergence of ADSGD, we first analyze Asynchronous Stochastic Block Coordinate Descent (ASBCD) as a tool, and then show that ADSGD converges under computation-delay-independent step sizes. The convergence result is established without assuming bounded data heterogeneity. Empirical experiments reveal that ADSGD outperforms existing methods in wall-clock convergence time across various scenarios. With its simplicity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Economic theories and models
