PrUE: Distilling Knowledge from Sparse Teacher Networks
Shaopu Wang, Xiaojun Chen, Mengzhen Kou, Jinqiao Shi

TL;DR
PrUE is a pruning technique that simplifies teacher networks by enlarging their prediction uncertainty, leading to improved knowledge distillation and better student network performance across multiple datasets.
Contribution
This paper introduces PrUE, a novel pruning method that reduces teacher certainty to generate softer labels, enhancing knowledge transfer in model compression.
Findings
Students trained with sparse teachers outperform those with full teachers.
PrUE enables distillation from deeper networks, improving student accuracy.
Method shows consistent gains on CIFAR-10/100, Tiny-ImageNet, and ImageNet.
Abstract
Although deep neural networks have enjoyed remarkable success across a wide variety of tasks, their ever-increasing size also imposes significant overhead on deployment. To compress these models, knowledge distillation was proposed to transfer knowledge from a cumbersome (teacher) network into a lightweight (student) network. However, guidance from a teacher does not always improve the generalization of students, especially when the size gap between student and teacher is large. Previous works argued that it was due to the high certainty of the teacher, resulting in harder labels that were difficult to fit. To soften these labels, we present a pruning method termed Prediction Uncertainty Enlargement (PrUE) to simplify the teacher. Specifically, our method aims to decrease the teacher's certainty about data, thereby generating soft predictions for students. We empirically investigate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsPruning · Knowledge Distillation
