EPSD: Early Pruning with Self-Distillation for Efficient Model   Compression

Dong Chen; Ning Liu; Yichen Zhu; Zhengping Che; Rui Ma; Fachao Zhang,; Xiaofeng Mou; Yi Chang; Jian Tang

arXiv:2402.00084·cs.LG·February 2, 2024·2 cites

EPSD: Early Pruning with Self-Distillation for Efficient Model Compression

Dong Chen, Ning Liu, Yichen Zhu, Zhengping Che, Rui Ma, Fachao Zhang,, Xiaofeng Mou, Yi Chang, Jian Tang

PDF

Open Access

TL;DR

This paper introduces EPSD, a novel framework that combines early pruning with self-distillation to efficiently compress neural networks, reducing computational costs while maintaining or improving performance across multiple benchmarks.

Contribution

EPSD is the first method to integrate early pruning with self-distillation, preserving distillable weights for more effective model compression without pre-training.

Findings

01

EPSD outperforms existing pruning and self-distillation methods on various benchmarks.

02

EPSD reduces training time and computational costs compared to traditional methods.

03

Pruned networks with EPSD achieve higher accuracy and better distillation quality.

Abstract

Neural network compression techniques, such as knowledge distillation (KD) and network pruning, have received increasing attention. Recent work `Prune, then Distill' reveals that a pruned student-friendly teacher network can benefit the performance of KD. However, the conventional teacher-student pipeline, which entails cumbersome pre-training of the teacher and complicated compression steps, makes pruning with KD less efficient. In addition to compressing models, recent compression techniques also emphasize the aspect of efficiency. Early pruning demands significantly less computational cost in comparison to the conventional pruning methods as it does not require a large pre-trained model. Likewise, a special case of KD, known as self-distillation (SD), is more efficient since it requires no pre-training or student-teacher pair selection. This inspires us to collaborate early pruning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques

MethodsPruning · Knowledge Distillation