Data Efficient Stagewise Knowledge Distillation

Akshay Kulkarni; Navid Panchi; Sharath Chandra Raparthy; Shital; Chiddarwar

arXiv:1911.06786·cs.LG·June 24, 2020·5 cites

Data Efficient Stagewise Knowledge Distillation

Akshay Kulkarni, Navid Panchi, Sharath Chandra Raparthy, Shital, Chiddarwar

PDF

Open Access 1 Repo

TL;DR

This paper introduces Stagewise Knowledge Distillation (SKD), a data-efficient, progressive training method that improves the performance of compact neural networks by leveraging teacher knowledge more effectively across classification and segmentation tasks.

Contribution

The paper proposes a novel stagewise training approach for knowledge distillation that enhances data efficiency and outperforms existing methods in various tasks.

Findings

01

Significant performance gains with less data used in distillation.

02

Outperforms existing knowledge distillation techniques.

03

Compatible with other model compression methods.

Abstract

Despite the success of Deep Learning (DL), the deployment of modern DL models requiring large computational power poses a significant problem for resource-constrained systems. This necessitates building compact networks that reduce computations while preserving performance. Traditional Knowledge Distillation (KD) methods that transfer knowledge from teacher to student (a) use a single-stage and (b) require the whole data set while distilling the knowledge to the student. In this work, we propose a new method called Stagewise Knowledge Distillation (SKD) which builds on traditional KD methods by progressive stagewise training to leverage the knowledge gained from the teacher, resulting in data-efficient distillation process. We evaluate our method on classification and semantic segmentation tasks. We show, across the tested tasks, significant performance gains even with a fraction of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IvLabs/stagewise-knowledge-distillation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsKnowledge Distillation