Accelerating Distributed ML Training via Selective Synchronization

Sahil Tyagi; Martin Swany

arXiv:2307.07950·cs.DC·January 30, 2024

Accelerating Distributed ML Training via Selective Synchronization

Sahil Tyagi, Martin Swany

PDF

Open Access 1 Repo

TL;DR

This paper introduces exttt{SelSync}, a dynamic synchronization method for distributed deep neural network training that reduces communication overhead while maintaining or improving accuracy, achieving up to 14× faster training.

Contribution

exttt{SelSync} is a novel, low-overhead, semi-synchronous training approach that adaptively chooses synchronization steps based on update significance.

Findings

01

Converges to same or better accuracy than BSP.

02

Reduces training time by up to 14×.

03

Effective in semi-synchronous training scenarios.

Abstract

In distributed training, deep neural networks (DNNs) are launched over multiple workers concurrently and aggregate their local updates on each step in bulk-synchronous parallel (BSP) training. However, BSP does not linearly scale-out due to high communication cost of aggregation. To mitigate this overhead, alternatives like Federated Averaging (FedAvg) and Stale-Synchronous Parallel (SSP) either reduce synchronization frequency or eliminate it altogether, usually at the cost of lower final accuracy. In this paper, we present \texttt{SelSync}, a practical, low-overhead method for DNN training that dynamically chooses to incur or avoid communication at each step either by calling the aggregation op or applying local updates based on their significance. We propose various optimizations as part of \texttt{SelSync} to improve convergence in the context of \textit{semi-synchronous} training.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sahiltyagi4/selsync
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Privacy-Preserving Technologies in Data