PrUE: Distilling Knowledge from Sparse Teacher Networks

Shaopu Wang; Xiaojun Chen; Mengzhen Kou; Jinqiao Shi

arXiv:2207.00586·cs.CV·July 5, 2022

PrUE: Distilling Knowledge from Sparse Teacher Networks

Shaopu Wang, Xiaojun Chen, Mengzhen Kou, Jinqiao Shi

PDF

Open Access 1 Repo

TL;DR

PrUE is a pruning technique that simplifies teacher networks by enlarging their prediction uncertainty, leading to improved knowledge distillation and better student network performance across multiple datasets.

Contribution

This paper introduces PrUE, a novel pruning method that reduces teacher certainty to generate softer labels, enhancing knowledge transfer in model compression.

Findings

01

Students trained with sparse teachers outperform those with full teachers.

02

PrUE enables distillation from deeper networks, improving student accuracy.

03

Method shows consistent gains on CIFAR-10/100, Tiny-ImageNet, and ImageNet.

Abstract

Although deep neural networks have enjoyed remarkable success across a wide variety of tasks, their ever-increasing size also imposes significant overhead on deployment. To compress these models, knowledge distillation was proposed to transfer knowledge from a cumbersome (teacher) network into a lightweight (student) network. However, guidance from a teacher does not always improve the generalization of students, especially when the size gap between student and teacher is large. Previous works argued that it was due to the high certainty of the teacher, resulting in harder labels that were difficult to fit. To soften these labels, we present a pruning method termed Prediction Uncertainty Enlargement (PrUE) to simplify the teacher. Specifically, our method aims to decrease the teacher's certainty about data, thereby generating soft predictions for students. We empirically investigate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangshaopu/prue
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsPruning · Knowledge Distillation