Teacher-Guided One-Shot Pruning via Context-Aware Knowledge Distillation
Md. Samiul Alim, Sharjil Khan, Amrijit Biswas, Fuad Rahman, Shafin Rahman, Nabeel Mohammed

TL;DR
This paper introduces a teacher-guided one-shot pruning method that combines knowledge distillation with importance scoring, enabling efficient neural network compression with minimal performance loss.
Contribution
It presents a novel framework that integrates KD into importance score estimation for one-shot global pruning, reducing computational costs compared to iterative methods.
Findings
Achieves high sparsity with minimal accuracy loss across multiple benchmarks.
Outperforms state-of-the-art pruning methods at high sparsity levels.
Offers a more efficient alternative to iterative pruning schemes.
Abstract
Unstructured pruning remains a powerful strategy for compressing deep neural networks, yet it often demands iterative train-prune-retrain cycles, resulting in significant computational overhead. To address this challenge, we introduce a novel teacher-guided pruning framework that tightly integrates Knowledge Distillation (KD) with importance score estimation. Unlike prior approaches that apply KD as a post-pruning recovery step, our method leverages gradient signals informed by the teacher during importance score calculation to identify and retain parameters most critical for both task performance and knowledge transfer. Our method facilitates a one-shot global pruning strategy that efficiently eliminates redundant weights while preserving essential representations. After pruning, we employ sparsity-aware retraining with and without KD to recover accuracy without reactivating pruned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
