Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression
Usma Niyaz, Deepti R. Bathula

TL;DR
This paper introduces a combined knowledge distillation and mutual learning framework with online training, improving model compression by leveraging peer-to-peer student interactions alongside a teacher network.
Contribution
It proposes a novel single-teacher, multi-student framework that integrates KD and ML with online training, enhancing performance in model compression tasks.
Findings
Ensemble of students outperforms individual models.
Combined KD and ML yields better results than using either alone.
Effective on biomedical classification and object detection tasks.
Abstract
Knowledge distillation (KD) is an effective model compression technique where a compact student network is taught to mimic the behavior of a complex and highly trained teacher network. In contrast, Mutual Learning (ML) provides an alternative strategy where multiple simple student networks benefit from sharing knowledge, even in the absence of a powerful but static teacher network. Motivated by these findings, we propose a single-teacher, multi-student framework that leverages both KD and ML to achieve better performance. Furthermore, an online distillation strategy is utilized to train the teacher and students simultaneously. To evaluate the performance of the proposed approach, extensive experiments were conducted using three different versions of teacher-student networks on benchmark biomedical classification (MSI vs. MSS) and object detection (Polyp Detection) tasks. Ensemble of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Machine Learning and Data Classification · Brain Tumor Detection and Classification
