Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

Stefan Hadjis; Ce Zhang; Ioannis Mitliagkas; Dan Iter; Christopher; R\'e

arXiv:1606.04487·cs.DC·October 20, 2016·50 cites

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

Stefan Hadjis, Ce Zhang, Ioannis Mitliagkas, Dan Iter, Christopher, R\'e

PDF

Open Access 1 Repo

TL;DR

This paper introduces Omnivore, a system that optimizes multi-device deep learning training on CPUs and GPUs by improving throughput, tuning asynchronous parallelization, and efficiently allocating resources, resulting in significantly faster training times.

Contribution

Omnivore provides a novel understanding of system and optimization interactions, and offers an efficient hyperparameter optimizer that enhances training speed over existing systems.

Findings

01

Achieves at least 5.5x throughput improvement on CPUs.

02

Provides a hyperparameter optimizer that reduces training time by up to 12x.

03

Demonstrates that tuning asynchronous parallelization is crucial for optimal performance.

Abstract

We study the factors affecting training time in multi-device deep learning systems. Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs. We first focus on the single-node setting and show that by using standard batching and data-parallel techniques, throughput can be improved by at least 5.5x over state-of-the-art systems on CPUs. This ensures an end-to-end training speed directly proportional to the throughput of a device regardless of its underlying hardware, allowing each node in the cluster to be treated as a black box. Our second contribution is a theoretical and empirical study of the tradeoffs affecting end-to-end training time in a multiple-device setting. We identify the degree of asynchronous parallelization as a key factor affecting both hardware and statistical efficiency. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HazyResearch/Omnivore
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Advanced Memory and Neural Computing