Collaborative Multi-Teacher Knowledge Distillation for Learning Low   Bit-width Deep Neural Networks

Cuong Pham; Tuan Hoang; Thanh-Toan Do

arXiv:2210.16103·cs.CV·October 31, 2022·1 cites

Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks

Cuong Pham, Tuan Hoang, Thanh-Toan Do

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel framework combining multi-teacher knowledge distillation and network quantization to effectively train low bit-width deep neural networks, enabling collaborative learning among quantized models.

Contribution

It proposes a new method that integrates multi-teacher knowledge distillation with network quantization, fostering collaborative and mutual learning among quantized teachers and students.

Findings

01

Quantized student models outperform some full-precision models.

02

The method achieves competitive results on CIFAR100 and ImageNet.

03

Collaborative learning enhances low bit-width DNN performance.

Abstract

Knowledge distillation which learns a lightweight student model by distilling knowledge from a cumbersome teacher model is an attractive approach for learning compact deep neural networks (DNNs). Recent works further improve student network performance by leveraging multiple teacher networks. However, most of the existing knowledge distillation-based multi-teacher methods use separately pretrained teachers. This limits the collaborative learning between teachers and the mutual learning between teachers and student. Network quantization is another attractive approach for learning compact DNNs. However, most existing network quantization methods are developed and evaluated without considering multi-teacher support to enhance the performance of quantized student model. In this paper, we propose a novel framework that leverages both multi-teacher knowledge distillation and network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification

MethodsKnowledge Distillation