BicKD: Bilateral Contrastive Knowledge Distillation

Jiangnan Zhu; Yukai Xu; Li Xiong; Yixuan Liu; Junxu Liu; Hong kyu Lee; Yujie Gu

arXiv:2602.01265·cs.LG·May 1, 2026

BicKD: Bilateral Contrastive Knowledge Distillation

Jiangnan Zhu, Yukai Xu, Li Xiong, Yixuan Liu, Junxu Liu, Hong kyu Lee, Yujie Gu

PDF

TL;DR

BicKD introduces a bilateral contrastive loss for knowledge distillation, enabling class-wise and sample-wise comparison, which improves transfer performance over traditional methods.

Contribution

This work proposes a novel bilateral contrastive loss for KD that emphasizes class-wise orthogonality and improves knowledge transfer effectiveness.

Findings

01

BicKD outperforms state-of-the-art KD methods across various benchmarks.

02

The bilateral contrastive loss enhances class separation and distribution structure.

03

Experiments demonstrate improved accuracy and transfer efficiency.

Abstract

Knowledge distillation (KD) is a machine learning framework that transfers knowledge from a teacher model to a student model. The vanilla KD proposed by Hinton et al. has been the dominant approach in logit-based distillation and demonstrates compelling performance. However, it only performs sample-wise probability alignment between teacher and student's predictions, lacking an mechanism for class-wise comparison. Besides, vanilla KD imposes no structural constraint on the probability space. In this work, we propose a simple yet effective methodology, bilateral contrastive knowledge distillation (BicKD). This approach introduces a novel bilateral contrastive loss, which intensifies the orthogonality among different class generalization spaces while preserving consistency within the same class. The bilateral formulation enables explicit comparison of both sample-wise and class-wise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.