Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning
Seonghak Kim, Gyeongdo Ham, Yucheol Cho, and Daeshik Kim

TL;DR
This paper introduces R2KD, a novel knowledge distillation method that uses correlation distance and network pruning to improve the transfer of knowledge from complex teachers to lightweight students, especially in challenging datasets.
Contribution
The paper proposes a new KD approach combining correlation distance and network pruning to enhance robustness and effectiveness over existing methods.
Findings
Outperforms state-of-the-art KD methods on multiple datasets.
Effectively incorporates data augmentation to improve student model performance.
Demonstrates robustness in challenging and confounding datasets.
Abstract
The improvement in the performance of efficient and lightweight models (i.e., the student model) is achieved through knowledge distillation (KD), which involves transferring knowledge from more complex models (i.e., the teacher model). However, most existing KD techniques rely on Kullback-Leibler (KL) divergence, which has certain limitations. First, if the teacher distribution has high entropy, the KL divergence's mode-averaging nature hinders the transfer of sufficient target information. Second, when the teacher distribution has low entropy, the KL divergence tends to excessively focus on specific modes, which fails to convey an abundant amount of valuable knowledge to the student. Consequently, when dealing with datasets that contain numerous confounding or challenging samples, student models may struggle to acquire sufficient knowledge, resulting in subpar performance. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification
MethodsFocus · Knowledge Distillation
