Confidence Conditioned Knowledge Distillation

Sourav Mishra; Suresh Sundaram

arXiv:2107.06993·cs.LG·July 16, 2021

Confidence Conditioned Knowledge Distillation

Sourav Mishra, Suresh Sundaram

PDF

Open Access

TL;DR

This paper introduces Confidence Conditioned Knowledge Distillation (CCKD), a method that adaptively transfers knowledge from teacher to student models using confidence-based, sample-specific loss functions, improving data efficiency and robustness against adversarial attacks.

Contribution

The paper proposes a novel confidence-conditioned approach to knowledge distillation that dynamically adjusts loss functions and targets based on teacher confidence, enhancing efficiency and robustness.

Findings

01

Achieves comparable or better generalization performance than state-of-the-art methods.

02

Improves data efficiency by selectively excluding samples from distillation.

03

Increases robustness against adversarial attacks by at least 3-6% depending on dataset.

Abstract

In this paper, a novel confidence conditioned knowledge distillation (CCKD) scheme for transferring the knowledge from a teacher model to a student model is proposed. Existing state-of-the-art methods employ fixed loss functions for this purpose and ignore the different levels of information that need to be transferred for different samples. In addition to that, these methods are also inefficient in terms of data usage. CCKD addresses these issues by leveraging the confidence assigned by the teacher model to the correct class to devise sample-specific loss functions (CCKD-L formulation) and targets (CCKD-T formulation). Further, CCKD improves the data efficiency by employing self-regulation to stop those samples from participating in the distillation process on which the student model learns faster. Empirical evaluations on several benchmark datasets show that CCKD methods achieve at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsKnowledge Distillation