One-for-All: Bridge the Gap Between Heterogeneous Architectures in   Knowledge Distillation

Zhiwei Hao; Jianyuan Guo; Kai Han; Yehui Tang; Han Hu; Yunhe Wang,; Chang Xu

arXiv:2310.19444·cs.CV·October 31, 2023·31 cites

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation

Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang,, Chang Xu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces OFA-KD, a novel knowledge distillation framework that effectively bridges the gap between heterogeneous neural network architectures by projecting features into an aligned space and using adaptive target enhancement.

Contribution

The paper proposes OFA-KD, a simple yet effective method for cross-architecture knowledge distillation that improves performance across CNNs, Transformers, and MLPs.

Findings

01

Achieves up to 8.0% accuracy gain on CIFAR-100.

02

Improves 0.7% accuracy on ImageNet-1K.

03

Demonstrates effectiveness across diverse architectures.

Abstract

Knowledge distillation~(KD) has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. However, most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family, particularly the hint-based approaches. By using centered kernel alignment (CKA) to compare the learned features between heterogeneous teacher and student models, we observe significant feature divergence. This divergence illustrates the ineffectiveness of previous hint-based methods in cross-architecture distillation. To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures. Specifically, we project intermediate features into an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hao840/ofakd
pytorchOfficial

Videos

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer · Absolute Position Encodings