Heterogeneous Complementary Distillation
Liuchi Xu, Hao Zheng, Lu Wang, Lisheng Xu, Jun Cheng

TL;DR
This paper introduces Heterogeneous Complementary Distillation (HCD), a novel framework for knowledge transfer between different neural network architectures that leverages complementary features and shared logits to improve student model performance.
Contribution
HCD offers a simple, effective approach for heterogeneous knowledge distillation by integrating complementary features and decomposing logits, outperforming existing methods with lower complexity.
Findings
HCD achieves superior accuracy on CIFAR-100, CUB200, and ImageNet-1K datasets.
HCD effectively leverages complementary features for knowledge transfer.
HCD outperforms state-of-the-art heterogeneous KD methods.
Abstract
Knowledge distillation (KD)transfers the dark knowledge from a complex teacher to a compact student. However, heterogeneous architecture distillation, such as Vision Transformer (ViT) to ResNet18, faces challenges due to differences in spatial feature representations.Traditional KD methods are mostly designed for homogeneous architectures and hence struggle to effectively address the disparity. Although heterogeneous KD approaches have been developed recently to solve these issues, they often incur high computational costs and complex designs, or overly rely on logit alignment, which limits their ability to leverage the complementary features. To overcome these limitations, we propose Heterogeneous Complementary Distillation (HCD),a simple yet effective framework that integrates complementary teacher and student features to align representations in shared logits.These logits are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
