Meta Knowledge Distillation

Jihao Liu; Boxiao Liu; Hongsheng Li; Yu Liu

arXiv:2202.07940·cs.LG·February 17, 2022·20 cites

Meta Knowledge Distillation

Jihao Liu, Boxiao Liu, Hongsheng Li, Yu Liu

PDF

Open Access

TL;DR

Meta Knowledge Distillation (MKD) introduces a meta-learning approach to optimize temperature parameters in knowledge distillation, effectively mitigating degradation issues and improving performance across various models and augmentations.

Contribution

The paper proposes MKD, a meta-learning method to adaptively tune temperature parameters in KD, enhancing robustness and transferability over prior fixed-temperature approaches.

Findings

01

MKD achieves state-of-the-art results on ViT architectures with only ImageNet-1K data.

02

MKD outperforms existing methods by 0.6% with fewer training epochs.

03

MKD is robust across different datasets, architectures, and data augmentations.

Abstract

Recent studies pointed out that knowledge distillation (KD) suffers from two degradation problems, the teacher-student gap and the incompatibility with strong data augmentations, making it not applicable to training state-of-the-art models, which are trained with advanced augmentations. However, we observe that a key factor, i.e., the temperatures in the softmax functions for generating probabilities of both the teacher and student models, was mostly overlooked in previous methods. With properly tuned temperatures, such degradation problems of KD can be much mitigated. However, instead of relying on a naive grid search, which shows poor transferability, we propose Meta Knowledge Distillation (MKD) to meta-learn the distillation with learnable meta temperature parameters. The meta parameters are adaptively adjusted during training according to the gradients of the learning objective. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI

MethodsMasked autoencoder · Knowledge Distillation · Softmax