Inplace knowledge distillation with teacher assistant for improved training of flexible deep neural networks
Alexey Ozerov, Ngoc Duong

TL;DR
This paper introduces IPKD-TA, a novel training method where sub-models act as teacher assistants to smaller models, improving resource-efficient deep neural network training for image classification tasks.
Contribution
The paper proposes a new IPKD-TA method where sub-models serve as teacher assistants, enhancing flexible neural network training beyond existing knowledge distillation techniques.
Findings
IPKD-TA performs on par with state-of-the-art methods.
IPKD-TA improves training outcomes in most cases.
Demonstrated on MSDNet and Slimmable MobileNet-V1 with CIFAR datasets.
Abstract
Deep neural networks (DNNs) have achieved great success in various machine learning tasks. However, most existing powerful DNN models are computationally expensive and memory demanding, hindering their deployment in devices with low memory and computational resources or in applications with strict latency requirements. Thus, several resource-adaptable or flexible approaches were recently proposed that train at the same time a big model and several resource-specific sub-models. Inplace knowledge distillation (IPKD) became a popular method to train those models and consists in distilling the knowledge from a larger model (teacher) to all other sub-models (students). In this work a novel generic training method called IPKD with teacher assistant (IPKD-TA) is introduced, where sub-models themselves become teacher assistants teaching smaller sub-models. We evaluated the proposed IPKD-TA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
MethodsKnowledge Distillation
