Exponential Graph is Provably Efficient for Decentralized Deep Training
Bicheng Ying, Kun Yuan, Yiming Chen, Hanbin Hu, Pan Pan, Wotao Yin

TL;DR
This paper demonstrates that exponential graphs enable efficient decentralized deep training by achieving fast communication and effective averaging, leading to state-of-the-art performance in decentralized SGD with minimal communication overhead.
Contribution
It introduces exponential graphs as a topology for decentralized SGD, proving their efficiency and showing how sequences of one-peer exponential graphs can achieve exact averaging with minimal communication.
Findings
Exponential graphs enable fast communication and effective averaging.
Sequences of one-peer exponential graphs can achieve exact averaging.
Decentralized SGD over exponential graphs achieves state-of-the-art efficiency.
Abstract
Decentralized SGD is an emerging training method for deep learning known for its much less (thus faster) communication per iteration, which relaxes the averaging step in parallel SGD to inexact averaging. The less exact the averaging is, however, the more the total iterations the training needs to take. Therefore, the key to making decentralized SGD efficient is to realize nearly-exact averaging using little communication. This requires a skillful choice of communication topology, which is an under-studied topic in decentralized optimization. In this paper, we study so-called exponential graphs where every node is connected to neighbors and is the total number of nodes. This work proves such graphs can lead to both fast communication and effective averaging simultaneously. We also discover that a sequence of one-peer exponential graphs, in which each node…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
