Dual-Delayed Asynchronous SGD for Arbitrarily Heterogeneous Data
Xiaolu Wang, Yuchang Sun, Hoi-To Wai, Jun Zhang

TL;DR
The paper introduces DuDe-ASGD, a novel asynchronous SGD algorithm that effectively handles highly heterogeneous data across distributed workers, achieving near-optimal convergence without the bounded dissimilarity condition.
Contribution
It proposes DuDe-ASGD, which utilizes stale gradients and incremental aggregation to neutralize data heterogeneity effects in asynchronous distributed learning.
Findings
Achieves near-minimax-optimal convergence rate for nonconvex problems.
Maintains computational efficiency comparable to traditional asynchronous SGD.
Outperforms existing asynchronous and synchronous SGD algorithms in experiments.
Abstract
We consider the distributed learning problem with data dispersed across multiple workers under the orchestration of a central server. Asynchronous Stochastic Gradient Descent (SGD) has been widely explored in such a setting to reduce the synchronization overhead associated with parallelization. However, the performance of asynchronous SGD algorithms often depends on a bounded dissimilarity condition among the workers' local data, a condition that can drastically affect their efficiency when the workers' data are highly heterogeneous. To overcome this limitation, we introduce the \textit{dual-delayed asynchronous SGD (DuDe-ASGD)} algorithm designed to neutralize the adverse effects of data heterogeneity. DuDe-ASGD makes full use of stale stochastic gradients from all workers during asynchronous training, leading to two distinct time lags in the model parameters and data samples utilized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Advanced Data Compression Techniques · Image and Signal Denoising Methods
MethodsStochastic Gradient Descent
