Horizontally Fused Training Array: An Effective Hardware Utilization   Squeezer for Training Novel Deep Learning Models

Shang Wang; Peiming Yang; Yuxuan Zheng; Xin Li; Gennady Pekhimenko

arXiv:2102.02344·cs.LG·March 30, 2021·6 cites

Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models

Shang Wang, Peiming Yang, Yuxuan Zheng, Xin Li, Gennady Pekhimenko

PDF

Open Access 2 Repos

TL;DR

This paper introduces HFTA, a framework extension that horizontally fuses multiple deep learning training jobs to improve hardware utilization and significantly increase training throughput on accelerators.

Contribution

The paper proposes HFTA, a novel method to fuse multiple DL training jobs horizontally, enhancing hardware efficiency and throughput for repetitive workloads.

Findings

01

HFTA achieves up to 15.1x higher training throughput.

02

HFTA effectively utilizes GPU and TPU resources.

03

Horizontal fusion is mathematically equivalent to optimized operators.

Abstract

Driven by the tremendous effort in researching novel deep learning (DL) algorithms, the training cost of developing new models increases staggeringly in recent years. We analyze GPU cluster usage statistics from a top research institute for more insights into the hardware efficiency achieved by typical DL training jobs. Our study reveals that single-accelerator training jobs can dominate the cluster-wide resource consumption when launched repetitively (e.g., for hyper-parameter tuning) while severely under-utilizing the hardware. Fortunately, we observe that such workloads have the following unique characteristics: (i) the models among jobs often have the same types of operators with the same shapes, and (ii) the inter-model horizontal fusion of such operators is mathematically equivalent to other already well-optimized operators. Thus, to help DL researchers and practitioners…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Stochastic Gradient Optimization Techniques