Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
Zehao Huang, Naiyan Wang

TL;DR
This paper introduces a novel knowledge transfer method that matches neuron selectivity distributions between teacher and student networks using MMD, improving student performance and transferability across tasks.
Contribution
It proposes a new distribution matching approach for knowledge transfer based on neuron selectivity patterns and a specialized loss function, enhancing neural network compression.
Findings
Significant performance improvements in student networks.
Effective across multiple datasets and tasks.
Complementary to existing knowledge transfer methods.
Abstract
Despite deep neural networks have demonstrated extraordinary power in various applications, their superior performances are at expense of high storage and computational costs. Consequently, the acceleration and compression of neural networks have attracted much attention recently. Knowledge Transfer (KT), which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the popular solutions. In this paper, we propose a novel knowledge transfer method by treating it as a distribution matching problem. Particularly, we match the distributions of neuron selectivity patterns between teacher and student networks. To achieve this goal, we devise a new KT loss function by minimizing the Maximum Mean Discrepancy (MMD) metric between these distributions. Combined with the original loss function, our method can significantly improve the performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
