Tuning the Scheduling of Distributed Stochastic Gradient Descent with   Bayesian Optimization

Valentin Dalibard; Michael Schaarschmidt; Eiko Yoneki

arXiv:1612.00383·stat.ML·December 4, 2016

Tuning the Scheduling of Distributed Stochastic Gradient Descent with Bayesian Optimization

Valentin Dalibard, Michael Schaarschmidt, Eiko Yoneki

PDF

Open Access

TL;DR

This paper introduces a Bayesian optimization-based method to efficiently tune distributed SGD system parameters, achieving faster convergence and better configurations in high-dimensional settings.

Contribution

It develops a probabilistic model that simulates distributed SGD behavior, enabling rapid tuning in high-dimensional parameter spaces.

Findings

01

Optimizer converges within ten iterations.

02

Outperforms generic optimizers by up to 2X.

03

Handles over thirty parameters effectively.

Abstract

We present an optimizer which uses Bayesian optimization to tune the system parameters of distributed stochastic gradient descent (SGD). Given a specific context, our goal is to quickly find efficient configurations which appropriately balance the load between the available machines to minimize the average SGD iteration time. Our experiments consider setups with over thirty parameters. Traditional Bayesian optimization, which uses a Gaussian process as its model, is not well suited to such high dimensional domains. To reduce convergence time, we exploit the available structure. We design a probabilistic model which simulates the behavior of distributed SGD and use it within Bayesian optimization. Our model can exploit many runtime measurements for inference per evaluation of the objective function. Our experiments show that our resulting optimizer converges to efficient configurations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Advanced Neural Network Applications

MethodsGaussian Process · Stochastic Gradient Descent