Boosting Pruned Networks with Linear Over-parameterization
Yu Qian, Jian Cao, Xiaoshuang Li, Jie Zhang, Hufei Li, Jue Chen

TL;DR
This paper introduces a method to improve accuracy of pruned neural networks by linearly over-parameterizing layers during fine-tuning and then re-parameterizing them back, enhanced with similarity-preserving knowledge distillation.
Contribution
The paper proposes a novel linear over-parameterization technique combined with knowledge distillation to better restore accuracy in pruned networks after compression.
Findings
Significantly outperforms vanilla fine-tuning on CIFAR-10 and ImageNet.
Effective especially at large pruning ratios.
Enables more accurate fine-tuning of highly compressed networks.
Abstract
Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time. To restore accuracy after pruning, fine-tuning is usually applied to pruned networks. However, too few remaining parameters in pruned networks inevitably bring a great challenge to fine-tuning to restore accuracy. To address this challenge, we propose a novel method that first linearly over-parameterizes the compact layers in pruned networks to enlarge the number of fine-tuning parameters and then re-parameterizes them to the original layers after fine-tuning. Specifically, we equivalently expand the convolution/linear layer with several consecutive convolution/linear layers that do not alter the current output feature maps. Furthermore, we utilize similarity-preserving knowledge distillation that encourages the over-parameterized block to learn the immediate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Imaging and Analysis · Domain Adaptation and Few-Shot Learning
MethodsPruning · Knowledge Distillation
