GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent
Jeff Daily, Abhinav Vishnu, Charles Siegel, Thomas Warfel, Vinay, Amatya

TL;DR
GossipGraD introduces a scalable asynchronous gossip-based SGD algorithm that significantly reduces communication overhead and maintains high efficiency for large-scale deep learning on GPU and CPU clusters.
Contribution
It proposes a novel gossip communication protocol for SGD that reduces communication complexity to O(1) and enables efficient large-scale deep learning.
Findings
Achieves near 100% efficiency on 128 GPUs for ResNet50.
Maintains top-1 accuracy comparable to state-of-the-art.
Reduces communication complexity from Θ(log p) to O(1).
Abstract
In this paper, we present GossipGraD - a gossip communication protocol based Stochastic Gradient Descent (SGD) algorithm for scaling Deep Learning (DL) algorithms on large-scale systems. The salient features of GossipGraD are: 1) reduction in overall communication complexity from {\Theta}(log(p)) for p compute nodes in well-studied SGD to O(1), 2) model diffusion such that compute nodes exchange their updates (gradients) indirectly after every log(p) steps, 3) rotation of communication partners for facilitating direct diffusion of gradients, 4) asynchronous distributed shuffle of samples during the feedforward phase in SGD to prevent over-fitting, 5) asynchronous communication of gradients for further reducing the communication cost of SGD and GossipGraD. We implement GossipGraD for GPU and CPU clusters and use NVIDIA GPUs (Pascal P100) connected with InfiniBand, and Intel Knights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMolecular Communication and Nanonetworks · Advanced Memory and Neural Computing · Cellular Automata and Applications
Methods1x1 Convolution · Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling
