Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning

Seonghak Kim; Gyeongdo Ham; Yucheol Cho; and Daeshik Kim

arXiv:2311.13934·cs.CV·May 20, 2025·1 cites

Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning

Seonghak Kim, Gyeongdo Ham, Yucheol Cho, and Daeshik Kim

PDF

Open Access

TL;DR

This paper introduces R2KD, a novel knowledge distillation method that uses correlation distance and network pruning to improve the transfer of knowledge from complex teachers to lightweight students, especially in challenging datasets.

Contribution

The paper proposes a new KD approach combining correlation distance and network pruning to enhance robustness and effectiveness over existing methods.

Findings

01

Outperforms state-of-the-art KD methods on multiple datasets.

02

Effectively incorporates data augmentation to improve student model performance.

03

Demonstrates robustness in challenging and confounding datasets.

Abstract

The improvement in the performance of efficient and lightweight models (i.e., the student model) is achieved through knowledge distillation (KD), which involves transferring knowledge from more complex models (i.e., the teacher model). However, most existing KD techniques rely on Kullback-Leibler (KL) divergence, which has certain limitations. First, if the teacher distribution has high entropy, the KL divergence's mode-averaging nature hinders the transfer of sufficient target information. Second, when the teacher distribution has low entropy, the KL divergence tends to excessively focus on specific modes, which fails to convey an abundant amount of valuable knowledge to the student. Consequently, when dealing with datasets that contain numerous confounding or challenging samples, student models may struggle to acquire sufficient knowledge, resulting in subpar performance. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification

MethodsFocus · Knowledge Distillation