Estimating and Maximizing Mutual Information for Knowledge Distillation
Aman Shrivastava, Yanjun Qi, Vicente Ordonez

TL;DR
This paper introduces a mutual information maximization approach for knowledge distillation that enhances low-capacity models by transferring knowledge from more complex models, improving accuracy across various architectures.
Contribution
The paper presents a flexible contrastive-based method to estimate and maximize mutual information for knowledge transfer between arbitrary teacher and student networks.
Findings
Outperforms existing methods across diverse network pairs
Achieves 74.55% accuracy on CIFAR100 with ShufflenetV2
Improves ImageNet ResNet-18 accuracy from 68.88% to 70.32%
Abstract
In this work, we propose Mutual Information Maximization Knowledge Distillation (MIMKD). Our method uses a contrastive objective to simultaneously estimate and maximize a lower bound on the mutual information of local and global feature representations between a teacher and a student network. We demonstrate through extensive experiments that this can be used to improve the performance of low capacity models by transferring knowledge from more performant but computationally expensive models. This can be used to produce better models that can be run on devices with low computational resources. Our method is flexible, we can distill knowledge from teachers with arbitrary network architectures to arbitrary student networks. Our empirical results show that MIMKD outperforms competing approaches across a wide range of student-teacher pairs with different capacities, with different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
