Asynchronous Training Schemes in Distributed Learning with Time Delay
Haoxiang Wang, Zhanhong Jiang, Chao Liu, Soumik Sarkar, Dongxiang, Jiang, Young M. Lee

TL;DR
This paper introduces PC-ASGD, a novel asynchronous distributed learning algorithm that predicts and clips gradients to mitigate delay effects, with theoretical convergence guarantees and empirical validation on neural networks.
Contribution
We propose PC-ASGD, combining gradient prediction and selective clipping to improve asynchronous training, with convergence analysis and practical implementation strategies.
Findings
PC-ASGD effectively reduces the impact of stale gradients.
Theoretical convergence is established for weakly strongly-convex and nonconvex functions.
Empirical results show improved training performance on neural networks.
Abstract
In the context of distributed deep learning, the issue of stale weights or gradients could result in poor algorithmic performance. This issue is usually tackled by delay tolerant algorithms with some mild assumptions on the objective functions and step sizes. In this paper, we propose a different approach to develop a new algorithm, called redicting lipping synchronous tochastic radient escent (aka, PC-ASGD). Specifically, PC-ASGD has two steps - the leverages the gradient prediction using Taylor expansion to reduce the staleness of the outdated weights while the selectively drops the outdated weights to alleviate their negative effects. A tradeoff parameter is introduced to balance the effects between these two steps. Theoretically, we present the convergence rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Age of Information Optimization · Sparse and Compressive Sensing Techniques
