Enriching Knowledge Distillation with Cross-Modal Teacher Fusion
Amir M. Mansourian, Amir Mohammad Babaei, Shohreh Kasaei

TL;DR
This paper introduces RichKD, a knowledge distillation framework that fuses traditional teacher outputs with CLIP's vision-language representations, enhancing model accuracy, confidence, and robustness through cross-modal fusion.
Contribution
It proposes a novel simple framework that integrates CLIP's multi-modal knowledge into KD, improving performance and reliability over existing unimodal methods.
Findings
RichKD outperforms existing baselines across multiple benchmarks.
Fusion with CLIP enhances model confidence and reduces errors.
The method improves robustness under distribution shifts and corruptions.
Abstract
Multi-teacher knowledge distillation (KD), a more effective technique than traditional single-teacher methods, transfers knowledge from expert teachers to a compact student model using logit or feature matching. However, most existing approaches lack knowledge diversity, as they rely solely on unimodal visual information, overlooking the potential of cross-modal representations. In this work, we explore the use of CLIP's vision-language knowledge as a complementary source of supervision for KD, an area that remains largely underexplored. We propose a simple yet effective framework that fuses the logits and features of a conventional teacher with those from CLIP. By incorporating CLIP's multi-prompt textual guidance, the fused supervision captures both dataset-specific and semantically enriched visual cues. Beyond accuracy, analysis shows that the fused teacher yields more confident and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks
