Variational Information Distillation for Knowledge Transfer
Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, Zhenwen, Dai

TL;DR
This paper introduces a novel information-theoretic framework for knowledge transfer in neural networks, maximizing mutual information between teacher and student to improve performance across various architectures.
Contribution
It proposes a mutual information-based method for knowledge transfer that outperforms existing approaches, especially across heterogeneous network architectures.
Findings
Outperforms existing knowledge transfer methods on standard tasks.
Effective transfer from CNNs to MLPs, achieving comparable performance.
Demonstrates robustness across different network architectures.
Abstract
Transferring knowledge from a teacher neural network pretrained on the same or a similar task to a student neural network can significantly improve the performance of the student neural network. Existing knowledge transfer approaches match the activations or the corresponding hand-crafted features of the teacher and the student networks. We propose an information-theoretic framework for knowledge transfer which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks. We compare our method with existing knowledge transfer methods on both knowledge distillation and transfer learning tasks and show that our method consistently outperforms existing methods. We further demonstrate the strength of our method on knowledge transfer across heterogeneous network architectures by transferring knowledge from a convolutional neural network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and ELM
