Fuse Before Transfer: Knowledge Fusion for Heterogeneous Distillation
Guopeng Li, Qiang Wang, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia

TL;DR
This paper introduces a novel knowledge distillation approach that uses an assistant model to bridge heterogeneous teacher-student pairs, employing a spatial-agnostic loss to improve feature alignment and achieve state-of-the-art results.
Contribution
It proposes a new framework for cross-architecture knowledge distillation using an assistant model and a spatial-agnostic loss, enabling effective transfer between diverse model types.
Findings
Achieves up to 11.47% accuracy gain on CIFAR-100.
Improves performance on ImageNet-1K by 3.67%.
Effective across CNNs, ViTs, and MLPs.
Abstract
Most knowledge distillation (KD) methodologies predominantly focus on teacher-student pairs with similar architectures, such as both being convolutional neural networks (CNNs). However, the potential and flexibility of KD can be greatly improved by expanding it to novel Cross-Architecture KD (CAKD), where the knowledge of homogeneous and heterogeneous teachers can be transferred flexibly to a given student. The primary challenge in CAKD lies in the substantial feature gaps between heterogeneous models, originating from the distinction of their inherent inductive biases and module functions. To this end, we introduce an assistant model as a bridge to facilitate smooth feature knowledge transfer between heterogeneous teachers and students. More importantly, within our proposed design principle, the assistant model combines the advantages of cross-architecture inductive biases and module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Innovative Teaching and Learning Methods · AI in Service Interactions
MethodsSoftmax · Attention Is All You Need · Focus · ALIGN · Knowledge Distillation · InfoNCE · Convolution
