Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study
Hongjun Choi, Eun Som Jeon, Ankita Shukla, Pavan Turaga

TL;DR
This paper empirically investigates how mixup data augmentation influences knowledge distillation, revealing the importance of smoothness and proposing strategies to improve student network training.
Contribution
It provides a detailed empirical analysis of the compatibility between mixup and knowledge distillation, highlighting the role of smoothness and suggesting improved training strategies.
Findings
Smoothness links mixup and KD.
Mixup enhances KD effectiveness.
Proposed strategies improve student network performance.
Abstract
Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the other hand, is widely used for model compression and transfer learning, which involves using a larger network's implicit knowledge to guide the learning of a smaller network. At first glance, these two techniques seem very different, however, we found that "smoothness" is the connecting link between the two and is also a crucial attribute in understanding KD's interplay with mixup. Although many mixup variants and distillation methods have been proposed, much remains to be understood regarding the role of a mixup in knowledge distillation. In this paper, we present a detailed empirical study on various important dimensions of compatibility between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study· youtube
Taxonomy
TopicsAI in cancer detection · Machine Learning and Data Classification · Generative Adversarial Networks and Image Synthesis
MethodsMixup · Knowledge Distillation
