Prune Your Model Before Distill It

Jinhyuk Park; Albert No

arXiv:2109.14960·cs.LG·July 26, 2022·1 cites

Prune Your Model Before Distill It

Jinhyuk Park, Albert No

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach called 'prune, then distill' which prunes the teacher model before distillation to improve transferability and reduce generalization error, leading to more effective neural network compression.

Contribution

The paper proposes a new framework that prunes the teacher model prior to distillation, providing theoretical insights and demonstrating improved transferability and regularization effects.

Findings

01

Pruned teachers outperform unpruned ones in distillation.

02

Pruning acts as a regularizer reducing generalization error.

03

The method enhances neural network compression efficiency.

Abstract

Knowledge distillation transfers the knowledge from a cumbersome teacher to a small student. Recent results suggest that the student-friendly teacher is more appropriate to distill since it provides more transferable knowledge. In this work, we propose the novel framework, "prune, then distill," that prunes the model first to make it more transferrable and then distill it to the student. We provide several exploratory examples where the pruned teacher teaches better than the original unpruned networks. We further show theoretically that the pruned teacher plays the role of regularizer in distillation, which reduces the generalization error. Based on this result, we propose a novel neural network compression scheme where the student network is formed based on the pruned teacher and then apply the "prune, then distill" strategy. The code is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ososos888/prune-then-distill
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsKnowledge Distillation