Multi-Level Decoupled Relational Distillation for Heterogeneous   Architectures

Yaoxin Yang; Peng Ye; Weihao Lin; Kangcong Li; Yan Wen; Jia Hao; Tao; Chen

arXiv:2502.06189·cs.CV·February 11, 2025

Multi-Level Decoupled Relational Distillation for Heterogeneous Architectures

Yaoxin Yang, Peng Ye, Weihao Lin, Kangcong Li, Yan Wen, Jia Hao, Tao, Chen

PDF

Open Access

TL;DR

This paper introduces MLDR-KD, a novel relational distillation framework that enhances knowledge transfer across heterogeneous neural network architectures by balancing dark knowledge and confidence, leading to significant performance improvements.

Contribution

The paper proposes a multi-level decoupled relational distillation method with dynamic feature fusion, advancing heterogeneous model distillation beyond existing approaches.

Findings

01

Achieves up to 4.86% accuracy gain on CIFAR-100

02

Improves performance on Tiny-ImageNet by 2.78%

03

Demonstrates robustness across diverse architectures

Abstract

Heterogeneous distillation is an effective way to transfer knowledge from cross-architecture teacher models to student models. However, existing heterogeneous distillation methods do not take full advantage of the dark knowledge hidden in the teacher's output, limiting their performance.To this end, we propose a novel framework named Multi-Level Decoupled Relational Knowledge Distillation (MLDR-KD) to unleash the potential of relational distillation in heterogeneous distillation. Concretely, we first introduce Decoupled Finegrained Relation Alignment (DFRA) in both logit and feature levels to balance the trade-off between distilled dark knowledge and the confidence in the correct category of the heterogeneous teacher model. Then, Multi-Scale Dynamic Fusion (MSDF) module is applied to dynamically fuse the projected logits of multiscale features at different stages in student model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProcess Optimization and Integration · Advanced Control Systems Optimization