HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU   Clusters through Integration of Pipelined Model Parallelism and Data   Parallelism

Jay H. Park; Gyeongchan Yun; Chang M. Yi; Nguyen T. Nguyen; Seungmin; Lee; Jaesik Choi; Sam H. Noh; Young-ri Choi

arXiv:2005.14038·cs.DC·May 29, 2020·41 cites

HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism

Jay H. Park, Gyeongchan Yun, Chang M. Yi, Nguyen T. Nguyen, Seungmin, Lee, Jaesik Choi, Sam H. Noh, Young-ri Choi

PDF

Open Access

TL;DR

HetPipe is a system that combines pipelined model parallelism and data parallelism to enable efficient training of large DNNs on heterogeneous GPU clusters, including low-power GPUs, achieving up to 49% faster convergence.

Contribution

HetPipe introduces a novel integration of PMP and DP with a new synchronization model, WSP, and demonstrates improved training speed on heterogeneous GPU clusters.

Findings

01

Achieves up to 49% faster convergence compared to existing methods.

02

Successfully integrates PMP and DP for heterogeneous GPU training.

03

Provides convergence proof for the proposed synchronization model.

Abstract

Deep Neural Network (DNN) models have continuously been growing in size in order to improve the accuracy and quality of the models. Moreover, for training of large DNN models, the use of heterogeneous GPUs is inevitable due to the short release cycle of new GPU architectures. In this paper, we investigate how to enable training of large DNN models on a heterogeneous GPU cluster that possibly includes whimpy GPUs that, as a standalone, could not be used for training. We present a DNN training system, HetPipe (Heterogeneous Pipeline), that integrates pipelined model parallelism (PMP) with data parallelism (DP). In HetPipe, a group of multiple GPUs, called a virtual worker, processes minibatches in a pipelined manner, and multiple such virtual workers employ data parallelism for higher performance. We also propose a novel parameter synchronization model, which we refer to as Wave…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Stochastic Gradient Optimization Techniques