Estimating and Maximizing Mutual Information for Knowledge Distillation

Aman Shrivastava; Yanjun Qi; Vicente Ordonez

arXiv:2110.15946·cs.CV·May 12, 2023

Estimating and Maximizing Mutual Information for Knowledge Distillation

Aman Shrivastava, Yanjun Qi, Vicente Ordonez

PDF

Open Access

TL;DR

This paper introduces a mutual information maximization approach for knowledge distillation that enhances low-capacity models by transferring knowledge from more complex models, improving accuracy across various architectures.

Contribution

The paper presents a flexible contrastive-based method to estimate and maximize mutual information for knowledge transfer between arbitrary teacher and student networks.

Findings

01

Outperforms existing methods across diverse network pairs

02

Achieves 74.55% accuracy on CIFAR100 with ShufflenetV2

03

Improves ImageNet ResNet-18 accuracy from 68.88% to 70.32%

Abstract

In this work, we propose Mutual Information Maximization Knowledge Distillation (MIMKD). Our method uses a contrastive objective to simultaneously estimate and maximize a lower bound on the mutual information of local and global feature representations between a teacher and a student network. We demonstrate through extensive experiments that this can be used to improve the performance of low capacity models by transferring knowledge from more performant but computationally expensive models. This can be used to produce better models that can be run on devices with low computational resources. Our method is flexible, we can distill knowledge from teachers with arbitrary network architectures to arbitrary student networks. Our empirical results show that MIMKD outperforms competing approaches across a wide range of student-teacher pairs with different capacities, with different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsKnowledge Distillation