Adaptive Explicit Knowledge Transfer for Knowledge Distillation

Hyungkeun Park; Jong-Seok Lee

arXiv:2409.01679·cs.CV·September 6, 2024

Adaptive Explicit Knowledge Transfer for Knowledge Distillation

Hyungkeun Park, Jong-Seok Lee

PDF

Open Access

TL;DR

This paper introduces AEKT, a novel knowledge distillation method that adaptively combines explicit and implicit knowledge transfer, improving classification performance on CIFAR-100 and ImageNet.

Contribution

It proposes a new loss and task separation strategy for adaptive explicit knowledge transfer, enhancing KD effectiveness.

Findings

01

AEKT outperforms state-of-the-art KD methods on CIFAR-100.

02

AEKT achieves superior results on ImageNet.

03

The method effectively models inter-class relationships.

Abstract

Logit-based knowledge distillation (KD) for classification is cost-efficient compared to feature-based KD but often subject to inferior performance. Recently, it was shown that the performance of logit-based KD can be improved by effectively delivering the probability distribution for the non-target classes from the teacher model, which is known as `implicit (dark) knowledge', to the student model. Through gradient analysis, we first show that this actually has an effect of adaptively controlling the learning of implicit knowledge. Then, we propose a new loss that enables the student to learn explicit knowledge (i.e., the teacher's confidence about the target class) along with implicit knowledge in an adaptive manner. Furthermore, we propose to separate the classification and distillation tasks for effective distillation and inter-class relationship modeling. Experimental results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsKnowledge Distillation