Scheduling Optimization Techniques for Neural Network Training

Hyungjun Oh; Hyungjun Oh; HyeongJu Kim; Jiwon Seo

arXiv:2110.00929·cs.LG·October 5, 2021·1 cites

Scheduling Optimization Techniques for Neural Network Training

Hyungjun Oh, Hyungjun Oh, HyeongJu Kim, Jiwon Seo

PDF

Open Access

TL;DR

This paper introduces out-of-order backprop scheduling techniques that optimize GPU utilization during neural network training, significantly enhancing throughput across various training paradigms.

Contribution

It proposes novel scheduling algorithms based on out-of-order backprop to improve GPU resource utilization in single-GPU, data-parallel, and pipeline-parallel training.

Findings

01

GPU utilization improved across training modes

02

Throughput increased for models like BERT and GPT-3

03

Scheduling algorithms outperform state-of-the-art systems

Abstract

Neural network training requires a large amount of computation and thus GPUs are often used for the acceleration. While they improve the performance, GPUs are underutilized during the training.This paper proposes out-of-order (ooo) backprop, an effective scheduling technique for neural network training. By exploiting the dependencies of gradient computations, ooo backprop enables to reorder their executions to make the most of the GPU resources. We show that the GPU utilization in single-GPU, data-parallel, and pipeline-parallel training can be commonly improve by applying ooo back-prop and prioritizing critical operations. We propose three scheduling algorithms based on ooo backprop. For single-GPU training, we schedule with multi-stream out-of-order computation to mask the kernel launch overhead. In data-parallel training, we reorder the gradient computations to maximize the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices