Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation

Jhe-Hao Lin; Yi Yao; Chan-Feng Hsu; Hongxia Xie; Hong-Han Shuai; Wen-Huang Cheng

arXiv:2501.08885·cs.CV·October 17, 2025

Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation

Jhe-Hao Lin, Yi Yao, Chan-Feng Hsu, Hongxia Xie, Hong-Han Shuai, Wen-Huang Cheng

PDF

Open Access

TL;DR

This paper introduces a perspective-aware knowledge distillation framework that enables effective feature transfer across diverse neural network architectures, including CNNs, ViTs, and MLPs, by adapting to their heterogeneity.

Contribution

The proposed framework incorporates prompt tuning and region-aware attention to facilitate heterogeneous architecture distillation, addressing a key limitation of prior methods.

Findings

01

Outperforms existing KD methods on CIFAR, ImageNet, and COCO datasets.

02

Effectively distills features across CNNs, ViTs, and MLPs.

03

Demonstrates robustness and adaptability in heterogeneous model settings.

Abstract

Knowledge distillation (KD) involves transferring knowledge from a pre-trained heavy teacher model to a lighter student model, thereby reducing the inference cost while maintaining comparable effectiveness. Prior KD techniques typically assume homogeneity between the teacher and student models. However, as technology advances, a wide variety of architectures have emerged, ranging from initial Convolutional Neural Networks (CNNs) to Vision Transformers (ViTs), and Multi-Level Perceptrons (MLPs). Consequently, developing a universal KD framework compatible with any architecture has become an important research topic. In this paper, we introduce a perspective-aware teaching (PAT) KD framework to enable feature distillation across diverse architectures. Our framework comprises two key components. First, we design prompt tuning blocks that incorporate student feedback, allowing teacher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Machine Learning and Data Classification

MethodsSoftmax · Attention Is All You Need