ABS: Adaptive Bounded Staleness Converges Faster and Communicates Less
Qiao Tan, Feng Zhu, Jingjing Zhang

TL;DR
This paper introduces ABS, an adaptive method that combines synchronous and asynchronous strategies to improve convergence speed and reduce communication in distributed learning.
Contribution
The paper proposes a novel adaptive bounded staleness (ABS) strategy that dynamically balances staleness and synchronization to enhance distributed learning efficiency.
Findings
ABS converges faster than existing methods.
ABS reduces communication rounds in distributed training.
Simulation shows ABS outperforms state-of-the-art schemes.
Abstract
Wall-clock convergence time and communication rounds are critical performance metrics in distributed learning with parameter-server setting. While synchronous methods converge fast but are not robust to stragglers; and asynchronous ones can reduce the wall-clock time per round but suffers from degraded convergence rate due to the staleness of gradients, it is natural to combine the two methods to achieve a balance. In this work, we develop a novel asynchronous strategy that leverages the advantages of both synchronous methods and asynchronous ones, named adaptive bounded staleness (ABS). The key enablers of ABS are two-fold. First, the number of workers that the PS waits for per round for gradient aggregation is adaptively selected to strike a straggling-staleness balance. Second, the workers with relatively high staleness are required to start a new round of computation to alleviate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Neural Networks and Applications
