Heterogeneous Complementary Distillation

Liuchi Xu; Hao Zheng; Lu Wang; Lisheng Xu; Jun Cheng

arXiv:2511.10942·cs.CV·February 16, 2026

Heterogeneous Complementary Distillation

Liuchi Xu, Hao Zheng, Lu Wang, Lisheng Xu, Jun Cheng

PDF

Open Access 1 Video

TL;DR

This paper introduces Heterogeneous Complementary Distillation (HCD), a novel framework for knowledge transfer between different neural network architectures that leverages complementary features and shared logits to improve student model performance.

Contribution

HCD offers a simple, effective approach for heterogeneous knowledge distillation by integrating complementary features and decomposing logits, outperforming existing methods with lower complexity.

Findings

01

HCD achieves superior accuracy on CIFAR-100, CUB200, and ImageNet-1K datasets.

02

HCD effectively leverages complementary features for knowledge transfer.

03

HCD outperforms state-of-the-art heterogeneous KD methods.

Abstract

Knowledge distillation (KD)transfers the dark knowledge from a complex teacher to a compact student. However, heterogeneous architecture distillation, such as Vision Transformer (ViT) to ResNet18, faces challenges due to differences in spatial feature representations.Traditional KD methods are mostly designed for homogeneous architectures and hence struggle to effectively address the disparity. Although heterogeneous KD approaches have been developed recently to solve these issues, they often incur high computational costs and complex designs, or overly rely on logit alignment, which limits their ability to leverage the complementary features. To overcome these limitations, we propose Heterogeneous Complementary Distillation (HCD),a simple yet effective framework that integrates complementary teacher and student features to align representations in shared logits.These logits are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Heterogeneous Complementary Distillation· underline

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications