Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks
Cuong Pham, Tuan Hoang, Thanh-Toan Do

TL;DR
This paper introduces a novel framework combining multi-teacher knowledge distillation and network quantization to effectively train low bit-width deep neural networks, enabling collaborative learning among quantized models.
Contribution
It proposes a new method that integrates multi-teacher knowledge distillation with network quantization, fostering collaborative and mutual learning among quantized teachers and students.
Findings
Quantized student models outperform some full-precision models.
The method achieves competitive results on CIFAR100 and ImageNet.
Collaborative learning enhances low bit-width DNN performance.
Abstract
Knowledge distillation which learns a lightweight student model by distilling knowledge from a cumbersome teacher model is an attractive approach for learning compact deep neural networks (DNNs). Recent works further improve student network performance by leveraging multiple teacher networks. However, most of the existing knowledge distillation-based multi-teacher methods use separately pretrained teachers. This limits the collaborative learning between teachers and the mutual learning between teachers and student. Network quantization is another attractive approach for learning compact DNNs. However, most existing network quantization methods are developed and evaluated without considering multi-teacher support to enhance the performance of quantized student model. In this paper, we propose a novel framework that leverages both multi-teacher knowledge distillation and network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification
MethodsKnowledge Distillation
