Cosine Similarity Knowledge Distillation for Individual Class Information Transfer
Gyeongdo Ham, Seonghak Kim, Suin Lee, Jae-Hyeok Lee, and Daeshik Kim

TL;DR
This paper introduces a novel knowledge distillation method using cosine similarity and a weighted temperature adjustment, enabling student models to better mimic teacher models and achieve comparable or superior performance.
Contribution
The paper proposes a cosine similarity-based knowledge distillation technique with a dynamic temperature adjustment, improving information transfer from teacher to student models.
Findings
Outperforms existing KD methods in various experiments
Enables student models to match or surpass teacher performance
Provides a new perspective on model compression using cosine similarity
Abstract
Previous logits-based Knowledge Distillation (KD) have utilized predictions about multiple categories within each sample (i.e., class predictions) and have employed Kullback-Leibler (KL) divergence to reduce the discrepancy between the student and teacher predictions. Despite the proliferation of KD techniques, the student model continues to fall short of achieving a similar level as teachers. In response, we introduce a novel and effective KD method capable of achieving results on par with or superior to the teacher models performance. We utilize teacher and student predictions about multiple samples for each category (i.e., batch predictions) and apply cosine similarity, a commonly used technique in Natural Language Processing (NLP) for measuring the resemblance between text embeddings. This metric's inherent scale-invariance property, which relies solely on vector direction and not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Online Learning and Analytics
MethodsKnowledge Distillation
