Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation
Jhe-Hao Lin, Yi Yao, Chan-Feng Hsu, Hongxia Xie, Hong-Han Shuai, Wen-Huang Cheng

TL;DR
This paper introduces a perspective-aware knowledge distillation framework that enables effective feature transfer across diverse neural network architectures, including CNNs, ViTs, and MLPs, by adapting to their heterogeneity.
Contribution
The proposed framework incorporates prompt tuning and region-aware attention to facilitate heterogeneous architecture distillation, addressing a key limitation of prior methods.
Findings
Outperforms existing KD methods on CIFAR, ImageNet, and COCO datasets.
Effectively distills features across CNNs, ViTs, and MLPs.
Demonstrates robustness and adaptability in heterogeneous model settings.
Abstract
Knowledge distillation (KD) involves transferring knowledge from a pre-trained heavy teacher model to a lighter student model, thereby reducing the inference cost while maintaining comparable effectiveness. Prior KD techniques typically assume homogeneity between the teacher and student models. However, as technology advances, a wide variety of architectures have emerged, ranging from initial Convolutional Neural Networks (CNNs) to Vision Transformers (ViTs), and Multi-Level Perceptrons (MLPs). Consequently, developing a universal KD framework compatible with any architecture has become an important research topic. In this paper, we introduce a perspective-aware teaching (PAT) KD framework to enable feature distillation across diverse architectures. Our framework comprises two key components. First, we design prompt tuning blocks that incorporate student feedback, allowing teacher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Machine Learning and Data Classification
MethodsSoftmax · Attention Is All You Need
