Scheduling Optimization Techniques for Neural Network Training
Hyungjun Oh, Hyungjun Oh, HyeongJu Kim, Jiwon Seo

TL;DR
This paper introduces out-of-order backprop scheduling techniques that optimize GPU utilization during neural network training, significantly enhancing throughput across various training paradigms.
Contribution
It proposes novel scheduling algorithms based on out-of-order backprop to improve GPU resource utilization in single-GPU, data-parallel, and pipeline-parallel training.
Findings
GPU utilization improved across training modes
Throughput increased for models like BERT and GPT-3
Scheduling algorithms outperform state-of-the-art systems
Abstract
Neural network training requires a large amount of computation and thus GPUs are often used for the acceleration. While they improve the performance, GPUs are underutilized during the training.This paper proposes out-of-order (ooo) backprop, an effective scheduling technique for neural network training. By exploiting the dependencies of gradient computations, ooo backprop enables to reorder their executions to make the most of the GPU resources. We show that the GPU utilization in single-GPU, data-parallel, and pipeline-parallel training can be commonly improve by applying ooo back-prop and prioritizing critical operations. We propose three scheduling algorithms based on ooo backprop. For single-GPU training, we schedule with multi-stream out-of-order computation to mask the kernel launch overhead. In data-parallel training, we reorder the gradient computations to maximize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices
