SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter   Optimization

Jeff Kinnison; Nathaniel Kremer-Herman; Douglas Thain; Walter; Scheirer

arXiv:1707.01428·cs.LG·January 23, 2018

SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization

Jeff Kinnison, Nathaniel Kremer-Herman, Douglas Thain, Walter, Scheirer

PDF

TL;DR

SHADHO is a scalable, hardware-aware framework for distributed hyperparameter optimization that improves efficiency by considering hardware heterogeneity and search space complexity, leading to better model performance.

Contribution

Introduces SHADHO, a novel framework that dynamically assigns hyperparameter search tasks to heterogeneous hardware based on complexity and performance metrics.

Findings

01

Achieves double the throughput of standard methods on SVM for MNIST.

02

Discovered 515 better-performing U-Net models in a week using 74 GPUs.

03

Effectively balances search across heterogeneous hardware environments.

Abstract

Computer vision is experiencing an AI renaissance, in which machine learning models are expediting important breakthroughs in academic research and commercial applications. Effectively training these models, however, is not trivial due in part to hyperparameters: user-configured values that control a model's ability to learn from data. Existing hyperparameter optimization methods are highly parallel but make no effort to balance the search across heterogeneous hardware or to prioritize searching high-impact spaces. In this paper, we introduce a framework for massively Scalable Hardware-Aware Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the relative complexity of each search space and monitors performance on the learning task over all trials. These metrics are then used as heuristics to assign hyperparameters to distributed workers based on their hardware.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · U-Net · Support Vector Machine