Cosine Similarity Knowledge Distillation for Individual Class   Information Transfer

Gyeongdo Ham; Seonghak Kim; Suin Lee; Jae-Hyeok Lee; and Daeshik Kim

arXiv:2311.14307·cs.CV·November 27, 2023·2 cites

Cosine Similarity Knowledge Distillation for Individual Class Information Transfer

Gyeongdo Ham, Seonghak Kim, Suin Lee, Jae-Hyeok Lee, and Daeshik Kim

PDF

Open Access

TL;DR

This paper introduces a novel knowledge distillation method using cosine similarity and a weighted temperature adjustment, enabling student models to better mimic teacher models and achieve comparable or superior performance.

Contribution

The paper proposes a cosine similarity-based knowledge distillation technique with a dynamic temperature adjustment, improving information transfer from teacher to student models.

Findings

01

Outperforms existing KD methods in various experiments

02

Enables student models to match or surpass teacher performance

03

Provides a new perspective on model compression using cosine similarity

Abstract

Previous logits-based Knowledge Distillation (KD) have utilized predictions about multiple categories within each sample (i.e., class predictions) and have employed Kullback-Leibler (KL) divergence to reduce the discrepancy between the student and teacher predictions. Despite the proliferation of KD techniques, the student model continues to fall short of achieving a similar level as teachers. In response, we introduce a novel and effective KD method capable of achieving results on par with or superior to the teacher models performance. We utilize teacher and student predictions about multiple samples for each category (i.e., batch predictions) and apply cosine similarity, a commonly used technique in Natural Language Processing (NLP) for measuring the resemblance between text embeddings. This metric's inherent scale-invariance property, which relies solely on vector direction and not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Online Learning and Analytics

MethodsKnowledge Distillation