EPSD: Early Pruning with Self-Distillation for Efficient Model Compression
Dong Chen, Ning Liu, Yichen Zhu, Zhengping Che, Rui Ma, Fachao Zhang,, Xiaofeng Mou, Yi Chang, Jian Tang

TL;DR
This paper introduces EPSD, a novel framework that combines early pruning with self-distillation to efficiently compress neural networks, reducing computational costs while maintaining or improving performance across multiple benchmarks.
Contribution
EPSD is the first method to integrate early pruning with self-distillation, preserving distillable weights for more effective model compression without pre-training.
Findings
EPSD outperforms existing pruning and self-distillation methods on various benchmarks.
EPSD reduces training time and computational costs compared to traditional methods.
Pruned networks with EPSD achieve higher accuracy and better distillation quality.
Abstract
Neural network compression techniques, such as knowledge distillation (KD) and network pruning, have received increasing attention. Recent work `Prune, then Distill' reveals that a pruned student-friendly teacher network can benefit the performance of KD. However, the conventional teacher-student pipeline, which entails cumbersome pre-training of the teacher and complicated compression steps, makes pruning with KD less efficient. In addition to compressing models, recent compression techniques also emphasize the aspect of efficiency. Early pruning demands significantly less computational cost in comparison to the conventional pruning methods as it does not require a large pre-trained model. Likewise, a special case of KD, known as self-distillation (SD), is more efficient since it requires no pre-training or student-teacher pair selection. This inspires us to collaborate early pruning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques
MethodsPruning · Knowledge Distillation
