Meta Knowledge Distillation
Jihao Liu, Boxiao Liu, Hongsheng Li, Yu Liu

TL;DR
Meta Knowledge Distillation (MKD) introduces a meta-learning approach to optimize temperature parameters in knowledge distillation, effectively mitigating degradation issues and improving performance across various models and augmentations.
Contribution
The paper proposes MKD, a meta-learning method to adaptively tune temperature parameters in KD, enhancing robustness and transferability over prior fixed-temperature approaches.
Findings
MKD achieves state-of-the-art results on ViT architectures with only ImageNet-1K data.
MKD outperforms existing methods by 0.6% with fewer training epochs.
MKD is robust across different datasets, architectures, and data augmentations.
Abstract
Recent studies pointed out that knowledge distillation (KD) suffers from two degradation problems, the teacher-student gap and the incompatibility with strong data augmentations, making it not applicable to training state-of-the-art models, which are trained with advanced augmentations. However, we observe that a key factor, i.e., the temperatures in the softmax functions for generating probabilities of both the teacher and student models, was mostly overlooked in previous methods. With properly tuned temperatures, such degradation problems of KD can be much mitigated. However, instead of relying on a naive grid search, which shows poor transferability, we propose Meta Knowledge Distillation (MKD) to meta-learn the distillation with learnable meta temperature parameters. The meta parameters are adaptively adjusted during training according to the gradients of the learning objective. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
MethodsMasked autoencoder · Knowledge Distillation · Softmax
