ABS: Adaptive Bounded Staleness Converges Faster and Communicates Less

Qiao Tan; Feng Zhu; Jingjing Zhang

arXiv:2301.08895·cs.DC·January 22, 2024

ABS: Adaptive Bounded Staleness Converges Faster and Communicates Less

Qiao Tan, Feng Zhu, Jingjing Zhang

PDF

Open Access

TL;DR

This paper introduces ABS, an adaptive method that combines synchronous and asynchronous strategies to improve convergence speed and reduce communication in distributed learning.

Contribution

The paper proposes a novel adaptive bounded staleness (ABS) strategy that dynamically balances staleness and synchronization to enhance distributed learning efficiency.

Findings

01

ABS converges faster than existing methods.

02

ABS reduces communication rounds in distributed training.

03

Simulation shows ABS outperforms state-of-the-art schemes.

Abstract

Wall-clock convergence time and communication rounds are critical performance metrics in distributed learning with parameter-server setting. While synchronous methods converge fast but are not robust to stragglers; and asynchronous ones can reduce the wall-clock time per round but suffers from degraded convergence rate due to the staleness of gradients, it is natural to combine the two methods to achieve a balance. In this work, we develop a novel asynchronous strategy that leverages the advantages of both synchronous methods and asynchronous ones, named adaptive bounded staleness (ABS). The key enablers of ABS are two-fold. First, the number of workers that the PS waits for per round for gradient aggregation is adaptively selected to strike a straggling-staleness balance. Second, the workers with relatively high staleness are required to start a new round of computation to alleviate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Neural Networks and Applications