Pipe-BD: Pipelined Parallel Blockwise Distillation

Hongsun Jang; Jaewon Jung; Jaeyong Song; Joonsang Yu; Youngsok Kim,; and Jinho Lee

arXiv:2301.12443·cs.LG·January 31, 2023

Pipe-BD: Pipelined Parallel Blockwise Distillation

Hongsun Jang, Jaewon Jung, Jaeyong Song, Joonsang Yu, Youngsok Kim,, and Jinho Lee

PDF

Open Access 1 Repo

TL;DR

Pipe-BD introduces a pipeline parallelism approach to accelerate blockwise distillation of large neural networks, reducing redundant computations and improving resource utilization without altering the core distillation process.

Contribution

It proposes Pipe-BD, a novel parallelization technique that enhances efficiency and speed of blockwise distillation through pipeline and hybrid parallelism, addressing existing computational bottlenecks.

Findings

01

Significant acceleration in training time across multiple models and datasets.

02

Improved GPU utilization and resource efficiency.

03

Effective workload balancing with hybrid parallelism.

Abstract

Training large deep neural network models is highly challenging due to their tremendous computational and memory requirements. Blockwise distillation provides one promising method towards faster convergence by splitting a large model into multiple smaller models. In state-of-the-art blockwise distillation methods, training is performed block-by-block in a data-parallel manner using multiple GPUs. To produce inputs for the student blocks, the teacher model is executed from the beginning until the current block under training. However, this results in a high overhead of redundant teacher execution, low GPU utilization, and extra data loading. To address these problems, we propose Pipe-BD, a novel parallelization method for blockwise distillation. Pipe-BD aggressively utilizes pipeline parallelism for blockwise distillation, eliminating redundant teacher block execution and increasing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hongsunjang/pipe-bd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques