Harmonizing knowledge Transfer in Neural Network with Unified Distillation
Yaomin Huang, Zaomin Yan, Chaomin Shen, Faming Fang, and Guixu Zhang

TL;DR
This paper proposes a unified knowledge distillation framework that aggregates features from multiple layers to transfer comprehensive semantic knowledge from teacher to student neural networks.
Contribution
It introduces a novel approach that combines feature-based and logits-based distillation by aggregating intermediate features into a unified representation for knowledge transfer.
Findings
Improved student network performance across various tasks.
Effective aggregation of multi-layer features enhances knowledge transfer.
Unified distribution constraint ensures coherent knowledge distillation.
Abstract
Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories emerge within KD methods: feature-based, focusing on intermediate layers' features, and logits-based, targeting the final layer's logits. This paper introduces a novel perspective by leveraging diverse knowledge sources within a unified KD framework. Specifically, we aggregate features from intermediate layers into a comprehensive representation, effectively gathering semantic information from different stages and scales. Subsequently, we predict the distribution parameters from this representation. These steps transform knowledge from the intermediate layers into corresponding distributive forms, thereby allowing for knowledge distillation through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsKnowledge Distillation
