Subclass Knowledge Distillation with Known Subclass Labels
Ahmad Sajedi, Yuri A. Lawryshyn, Konstantinos N. Plataniotis

TL;DR
This paper introduces Subclass Knowledge Distillation (SKD), a framework that leverages subclass information within classes to improve the performance of student models in classification tasks, demonstrated on clinical colorectal polyp detection.
Contribution
The work presents a novel SKD framework that transfers subclass knowledge from teacher to student, enhancing performance by utilizing subclass logits and information not captured by class logits.
Findings
Student with SKD achieved 85.05% F1-score.
SKD improved student performance by 1.47% over conventional KD.
Extra subclass knowledge corresponds to 0.4656 label bits per sample.
Abstract
This work introduces a novel knowledge distillation framework for classification tasks where information on existing subclasses is available and taken into consideration. In classification tasks with a small number of classes or binary detection, the amount of information transferred from the teacher to the student is restricted, thus limiting the utility of knowledge distillation. Performance can be improved by leveraging information of possible subclasses within the classes. To that end, we propose the so-called Subclass Knowledge Distillation (SKD), a process of transferring the knowledge of predicted subclasses from a teacher to a smaller student. Meaningful information that is not in the teacher's class logits but exists in subclass logits (e.g., similarities within classes) will be conveyed to the student through the SKD, which will then boost the student's performance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
