GossipGraD: Scalable Deep Learning using Gossip Communication based   Asynchronous Gradient Descent

Jeff Daily; Abhinav Vishnu; Charles Siegel; Thomas Warfel; Vinay; Amatya

arXiv:1803.05880·cs.DC·March 16, 2018·81 cites

GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent

Jeff Daily, Abhinav Vishnu, Charles Siegel, Thomas Warfel, Vinay, Amatya

PDF

Open Access

TL;DR

GossipGraD introduces a scalable asynchronous gossip-based SGD algorithm that significantly reduces communication overhead and maintains high efficiency for large-scale deep learning on GPU and CPU clusters.

Contribution

It proposes a novel gossip communication protocol for SGD that reduces communication complexity to O(1) and enables efficient large-scale deep learning.

Findings

01

Achieves near 100% efficiency on 128 GPUs for ResNet50.

02

Maintains top-1 accuracy comparable to state-of-the-art.

03

Reduces communication complexity from Θ(log p) to O(1).

Abstract

In this paper, we present GossipGraD - a gossip communication protocol based Stochastic Gradient Descent (SGD) algorithm for scaling Deep Learning (DL) algorithms on large-scale systems. The salient features of GossipGraD are: 1) reduction in overall communication complexity from {\Theta}(log(p)) for p compute nodes in well-studied SGD to O(1), 2) model diffusion such that compute nodes exchange their updates (gradients) indirectly after every log(p) steps, 3) rotation of communication partners for facilitating direct diffusion of gradients, 4) asynchronous distributed shuffle of samples during the feedforward phase in SGD to prevent over-fitting, 5) asynchronous communication of gradients for further reducing the communication cost of SGD and GossipGraD. We implement GossipGraD for GPU and CPU clusters and use NVIDIA GPUs (Pascal P100) connected with InfiniBand, and Intel Knights…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMolecular Communication and Nanonetworks · Advanced Memory and Neural Computing · Cellular Automata and Applications

Methods1x1 Convolution · Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling