PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for   Deep Learning Clusters

Isabelly Rocha; Nathaniel Morris; Lydia Y. Chen; Pascal Felber; Robert; Birke; Valerio Schiavoni

arXiv:2010.00501·cs.DC·October 5, 2020

PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters

Isabelly Rocha, Nathaniel Morris, Lydia Y. Chen, Pascal Felber, Robert, Birke, Valerio Schiavoni

PDF

Open Access

TL;DR

PipeTune is a framework that optimizes deep learning training by parallelly tuning hyperparameters and system parameters, significantly reducing training time and energy consumption.

Contribution

It introduces a pipelined parallel tuning approach that considers both hyperparameters and system parameters, improving efficiency over existing methods.

Findings

01

Up to 22.6% reduction in tuning time

02

1.7x speed-up in training time

03

Up to 29% energy savings

Abstract

DNN learning jobs are common in today's clusters due to the advances in AI driven services such as machine translation and image recognition. The most critical phase of these jobs for model performance and learning cost is the tuning of hyperparameters. Existing approaches make use of techniques such as early stopping criteria to reduce the tuning impact on learning cost. However, these strategies do not consider the impact that certain hyperparameters and systems parameters have on training time. This paper presents PipeTune, a framework for DNN learning jobs that addresses the trade-offs between these two types of parameters. PipeTune takes advantage of the high parallelism and recurring characteristics of such jobs to minimize the learning cost via a pipelined simultaneous tuning of both hyper and system parameters. Our experimental evaluation using three different types of workloads…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Advanced Data Storage Technologies