High Throughput Synchronous Distributed Stochastic Gradient Descent
Michael Teng, Frank Wood

TL;DR
This paper presents a novel synchronous distributed stochastic gradient descent algorithm that uses a generative model to predict worker run-times, enabling dynamic cutoff adjustments that improve throughput and reduce training time.
Contribution
It introduces a deep generative model for worker run-time prediction and demonstrates its effectiveness in dynamically optimizing gradient aggregation in distributed training.
Findings
Dynamic cutoff based on run-time prediction increases throughput.
Using generative models reduces neural network training times.
Eagerly discarding straggler gradients improves convergence speed.
Abstract
We introduce a new, high-throughput, synchronous, distributed, data-parallel, stochastic-gradient-descent learning algorithm. This algorithm uses amortized inference in a compute-cluster-specific, deep, generative, dynamical model to perform joint posterior predictive inference of the mini-batch gradient computation times of all worker-nodes in a parallel computing cluster. We show that a synchronous parameter server can, by utilizing such a model, choose an optimal cutoff time beyond which mini-batch gradient messages from slow workers are ignored that maximizes overall mini-batch gradient computations per second. In keeping with earlier findings we observe that, under realistic conditions, eagerly discarding the mini-batch gradient computations of stragglers not only increases throughput but actually increases the overall rate of convergence as a function of wall-clock time by virtue…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Neural Network Applications
