High Throughput Synchronous Distributed Stochastic Gradient Descent

Michael Teng; Frank Wood

arXiv:1803.04209·cs.DC·March 14, 2018·1 cites

High Throughput Synchronous Distributed Stochastic Gradient Descent

Michael Teng, Frank Wood

PDF

Open Access

TL;DR

This paper presents a novel synchronous distributed stochastic gradient descent algorithm that uses a generative model to predict worker run-times, enabling dynamic cutoff adjustments that improve throughput and reduce training time.

Contribution

It introduces a deep generative model for worker run-time prediction and demonstrates its effectiveness in dynamically optimizing gradient aggregation in distributed training.

Findings

01

Dynamic cutoff based on run-time prediction increases throughput.

02

Using generative models reduces neural network training times.

03

Eagerly discarding straggler gradients improves convergence speed.

Abstract

We introduce a new, high-throughput, synchronous, distributed, data-parallel, stochastic-gradient-descent learning algorithm. This algorithm uses amortized inference in a compute-cluster-specific, deep, generative, dynamical model to perform joint posterior predictive inference of the mini-batch gradient computation times of all worker-nodes in a parallel computing cluster. We show that a synchronous parameter server can, by utilizing such a model, choose an optimal cutoff time beyond which mini-batch gradient messages from slow workers are ignored that maximizes overall mini-batch gradient computations per second. In keeping with earlier findings we observe that, under realistic conditions, eagerly discarding the mini-batch gradient computations of stragglers not only increases throughput but actually increases the overall rate of convergence as a function of wall-clock time by virtue…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Neural Network Applications