MUSE: Feature Self-Distillation with Mutual Information and Self-Information
Yu Gong, Ye Yu, Gaurav Mittal, Greg Mori, Mei Chen

TL;DR
MUSE introduces a novel information-theoretic method combining mutual and self-information to enhance feature dependency and expressivity in CNNs, improving distillation performance across various tasks and architectures.
Contribution
The paper proposes MUSE, a new approach that uses mutual and self-information to better integrate features in CNNs, outperforming existing methods in knowledge distillation.
Findings
MUSE outperforms other feature discrepancy functions in distillation tasks.
MUSE is versatile and extends to tasks beyond image classification.
Empirical results show MUSE's superior performance across architectures.
Abstract
We present a novel information-theoretic approach to introduce dependency among features of a deep convolutional neural network (CNN). The core idea of our proposed method, called MUSE, is to combine MUtual information and SElf-information to jointly improve the expressivity of all features extracted from different layers in a CNN. We present two variants of the realization of MUSE -- Additive Information and Multiplicative Information. Importantly, we argue and empirically demonstrate that MUSE, compared to other feature discrepancy functions, is a more functional proxy to introduce dependency and effectively improve the expressivity of all features in the knowledge distillation framework. MUSE achieves superior performance over a variety of popular architectures and feature discrepancy functions for self-distillation and online distillation, and performs competitively with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
MethodsKnowledge Distillation
