Data Efficient Stagewise Knowledge Distillation
Akshay Kulkarni, Navid Panchi, Sharath Chandra Raparthy, Shital, Chiddarwar

TL;DR
This paper introduces Stagewise Knowledge Distillation (SKD), a data-efficient, progressive training method that improves the performance of compact neural networks by leveraging teacher knowledge more effectively across classification and segmentation tasks.
Contribution
The paper proposes a novel stagewise training approach for knowledge distillation that enhances data efficiency and outperforms existing methods in various tasks.
Findings
Significant performance gains with less data used in distillation.
Outperforms existing knowledge distillation techniques.
Compatible with other model compression methods.
Abstract
Despite the success of Deep Learning (DL), the deployment of modern DL models requiring large computational power poses a significant problem for resource-constrained systems. This necessitates building compact networks that reduce computations while preserving performance. Traditional Knowledge Distillation (KD) methods that transfer knowledge from teacher to student (a) use a single-stage and (b) require the whole data set while distilling the knowledge to the student. In this work, we propose a new method called Stagewise Knowledge Distillation (SKD) which builds on traditional KD methods by progressive stagewise training to leverage the knowledge gained from the teacher, resulting in data-efficient distillation process. We evaluate our method on classification and semantic segmentation tasks. We show, across the tested tasks, significant performance gains even with a fraction of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsKnowledge Distillation
