Gossip training for deep learning

Michael Blot; David Picard; Matthieu Cord; Nicolas Thome

arXiv:1611.09726·cs.CV·November 30, 2016·37 cites

Gossip training for deep learning

Michael Blot, David Picard, Matthieu Cord, Nicolas Thome

PDF

Open Access 1 Repo

TL;DR

This paper introduces GoSGD, a decentralized and asynchronous gossip-inspired method for distributed stochastic gradient descent, which accelerates deep learning training by improving consensus among multiple threads.

Contribution

Proposes GoSGD, a novel gossip-based distributed training algorithm for deep learning that is fully asynchronous and decentralized, enhancing convergence speed.

Findings

01

GoSGD shows promising results compared to EASGD on CIFAR-10.

02

The method achieves good consensus convergence properties.

03

It effectively speeds up training of convolutional networks.

Abstract

We address the issue of speeding up the training of convolutional networks. Here we study a distributed method adapted to stochastic gradient descent (SGD). The parallel optimization setup uses several threads, each applying individual gradient descents on a local variable. We propose a new way to share information between different threads inspired by gossip algorithms and showing good consensus convergence properties. Our method called GoSGD has the advantage to be fully asynchronous and decentralized. We compared our method to the recent EASGD in \cite{elastic} on CIFAR-10 show encouraging results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uoguelph-mlrg/Theano-MPI
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Memory and Neural Computing · Distributed Control Multi-Agent Systems