Confidence Conditioned Knowledge Distillation
Sourav Mishra, Suresh Sundaram

TL;DR
This paper introduces Confidence Conditioned Knowledge Distillation (CCKD), a method that adaptively transfers knowledge from teacher to student models using confidence-based, sample-specific loss functions, improving data efficiency and robustness against adversarial attacks.
Contribution
The paper proposes a novel confidence-conditioned approach to knowledge distillation that dynamically adjusts loss functions and targets based on teacher confidence, enhancing efficiency and robustness.
Findings
Achieves comparable or better generalization performance than state-of-the-art methods.
Improves data efficiency by selectively excluding samples from distillation.
Increases robustness against adversarial attacks by at least 3-6% depending on dataset.
Abstract
In this paper, a novel confidence conditioned knowledge distillation (CCKD) scheme for transferring the knowledge from a teacher model to a student model is proposed. Existing state-of-the-art methods employ fixed loss functions for this purpose and ignore the different levels of information that need to be transferred for different samples. In addition to that, these methods are also inefficient in terms of data usage. CCKD addresses these issues by leveraging the confidence assigned by the teacher model to the correct class to devise sample-specific loss functions (CCKD-L formulation) and targets (CCKD-T formulation). Further, CCKD improves the data efficiency by employing self-regulation to stop those samples from participating in the distillation process on which the student model learns faster. Empirical evaluations on several benchmark datasets show that CCKD methods achieve at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsKnowledge Distillation
