Gossip training for deep learning
Michael Blot, David Picard, Matthieu Cord, Nicolas Thome

TL;DR
This paper introduces GoSGD, a decentralized and asynchronous gossip-inspired method for distributed stochastic gradient descent, which accelerates deep learning training by improving consensus among multiple threads.
Contribution
Proposes GoSGD, a novel gossip-based distributed training algorithm for deep learning that is fully asynchronous and decentralized, enhancing convergence speed.
Findings
GoSGD shows promising results compared to EASGD on CIFAR-10.
The method achieves good consensus convergence properties.
It effectively speeds up training of convolutional networks.
Abstract
We address the issue of speeding up the training of convolutional networks. Here we study a distributed method adapted to stochastic gradient descent (SGD). The parallel optimization setup uses several threads, each applying individual gradient descents on a local variable. We propose a new way to share information between different threads inspired by gossip algorithms and showing good consensus convergence properties. Our method called GoSGD has the advantage to be fully asynchronous and decentralized. We compared our method to the recent EASGD in \cite{elastic} on CIFAR-10 show encouraging results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Memory and Neural Computing · Distributed Control Multi-Agent Systems
