Parallel Blockwise Knowledge Distillation for Deep Neural Network   Compression

Cody Blakeney; Xiaomin Li; Yan Yan; Ziliang Zong

arXiv:2012.03096·cs.LG·December 8, 2020

Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression

Cody Blakeney, Xiaomin Li, Yan Yan, Ziliang Zong

PDF

Open Access 1 Repo

TL;DR

This paper introduces a parallel blockwise knowledge distillation method that significantly accelerates DNN compression, reducing training time and energy consumption while maintaining model accuracy.

Contribution

It proposes a novel parallel distillation algorithm using local information and depthwise separable layers to speed up complex DNN compression.

Findings

01

Achieves 3x speedup and 19-29% energy savings on VGG and ResNet distillation.

02

Maintains negligible accuracy loss during acceleration.

03

Further improves speedup to 3.87x using distributed GPU clusters.

Abstract

Deep neural networks (DNNs) have been extremely successful in solving many challenging AI tasks in natural language processing, speech recognition, and computer vision nowadays. However, DNNs are typically computation intensive, memory demanding, and power hungry, which significantly limits their usage on platforms with constrained resources. Therefore, a variety of compression techniques (e.g. quantization, pruning, and knowledge distillation) have been proposed to reduce the size and power consumption of DNNs. Blockwise knowledge distillation is one of the compression techniques that can effectively reduce the size of a highly complex DNN. However, it is not widely adopted due to its long training time. In this paper, we propose a novel parallel blockwise distillation algorithm to accelerate the distillation process of sophisticated DNNs. Our algorithm leverages local information to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

codestar12/Parallel-Independent-Blockwise-Distillation
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsKnowledge Distillation · Dropout · Softmax · 1x1 Convolution · Convolution · Dense Connections · Max Pooling · Kaiming Initialization · Ethereum Customer Service Number +1-833-534-1729 · *Communicated@Fast*How Do I Communicate to Expedia?