Leader Stochastic Gradient Descent for Distributed Training of Deep   Learning Models: Extension

Yunfei Teng; Wenbo Gao; Francois Chalus; Anna Choromanska; Donald; Goldfarb; Adrian Weller

arXiv:1905.10395·cs.LG·April 29, 2022·1 cites

Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension

Yunfei Teng, Wenbo Gao, Francois Chalus, Anna Choromanska, Donald, Goldfarb, Adrian Weller

PDF

Open Access

TL;DR

This paper introduces a novel distributed training algorithm for deep learning that uses a leader-based gradient update, improving communication efficiency and convergence behavior over traditional methods.

Contribution

It proposes Leader Gradient Descent (LGD) and its stochastic and multi-leader variants, enhancing convergence, communication efficiency, and robustness in distributed deep learning training.

Findings

01

Outperforms state-of-the-art baselines in CNN training.

02

Reduces communication overhead by broadcasting only leader parameters.

03

Breaks symmetry issues in non-convex landscapes.

Abstract

We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation does not change the location of stationary points compared to the original optimization problem; (ii) we avoid convergence decelerations caused by pulling local workers descending to different local minima to each other (i.e. to the average of their parameters); (iii) our update by design breaks the curse of symmetry (the phenomenon of being trapped in poorly generalizing sub-optimal solutions in symmetric non-convex landscapes); and (iv) our approach is more communication efficient since it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Memory and Neural Computing