Asynchronous Fully-Decentralized SGD in the Cluster-Based Model
Hagit Attiya, Noa Schiller

TL;DR
This paper introduces fault-tolerant asynchronous decentralized SGD algorithms tailored for cluster-based models, achieving optimal convergence rates under certain conditions and tolerating process failures in distributed learning environments.
Contribution
The paper develops novel asynchronous decentralized SGD algorithms for cluster-based models that are fault-tolerant and achieve near-optimal convergence rates, even with process failures.
Findings
Achieves maximal distributed acceleration for strongly convex functions.
Convergence rate matches sequential SGD up to a logarithmic factor for arbitrary functions.
Requires a majority of non-faulty processes in clusters for non-convex optimization.
Abstract
This paper presents fault-tolerant asynchronous Stochastic Gradient Descent (SGD) algorithms. SGD is widely used for approximating the minimum of a cost function , as a core part of optimization and learning algorithms. Our algorithms are designed for the cluster-based model, which combines message-passing and shared-memory communication layers. Processes may fail by crashing, and the algorithm inside each cluster is wait-free, using only reads and writes. For a strongly convex function , our algorithm tolerates any number of failures, and provides convergence rate that yields the maximal distributed acceleration over the optimal convergence rate of sequential SGD. For arbitrary functions, the convergence rate has an additional term that depends on the maximal difference between the parameters at the same iteration. (This holds under standard assumptions on .) In this case,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Complexity and Algorithms in Graphs
