Bayes Conditional Distribution Estimation for Knowledge Distillation   Based on Conditional Mutual Information

Linfeng Ye; Shayan Mohajer Hamidi; Renhao Tan; En-Hui Yang

arXiv:2401.08732·cs.LG·March 11, 2024·1 cites

Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information

Linfeng Ye, Shayan Mohajer Hamidi, Renhao Tan, En-Hui Yang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel MCMI estimator for knowledge distillation that maximizes conditional mutual information, leading to more accurate teacher models and improved student performance, especially in zero-shot and few-shot learning scenarios.

Contribution

The paper proposes the MCMI method that enhances BCPD estimation by integrating CMI maximization into teacher training, improving KD effectiveness.

Findings

01

Student accuracy improved by up to 3.32% with MCMI teachers.

02

Significant gains in zero-shot and few-shot settings, up to 84% accuracy.

03

MCMI captures more contextual information in images.

Abstract

It is believed that in knowledge distillation (KD), the role of the teacher is to provide an estimate for the unknown Bayes conditional probability distribution (BCPD) to be used in the student training process. Conventionally, this estimate is obtained by training the teacher using maximum log-likelihood (MLL) method. To improve this estimate for KD, in this paper we introduce the concept of conditional mutual information (CMI) into the estimation of BCPD and propose a novel estimator called the maximum CMI (MCMI) method. Specifically, in MCMI estimation, both the log-likelihood and CMI of the teacher are simultaneously maximized when the teacher is trained. Through Eigen-CAM, it is further shown that maximizing the teacher's CMI value allows the teacher to capture more contextual information in an image cluster. Via conducting a thorough set of experiments, we show that by employing a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iclr2024mcmi/iclrmcmi
pytorchOfficial

Videos

Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Machine Learning and Algorithms

MethodsSparse Evolutionary Training · Knowledge Distillation