Mixed Sample Augmentation for Online Distillation
Yiqing Shen, Liwu Xu, Yuzhe Yang, Yaqian Li, Yandong Guo

TL;DR
This paper introduces a novel mixed sample augmentation method called Cut^n Mix for online knowledge distillation, demonstrating significant improvements over existing methods through extensive experiments on CIFAR datasets.
Contribution
It proposes a new mixed sample regularization technique, Cut^n Mix, specifically designed for online distillation, and develops a framework that enhances mutual learning and self-ensemble strategies.
Findings
Consistently outperforms state-of-the-art distillation methods on CIFAR datasets.
Cut^n Mix significantly improves the performance of online distillation.
The proposed framework enhances feature-level mutual learning and self-ensemble benefits.
Abstract
Mixed Sample Regularization (MSR), such as MixUp or CutMix, is a powerful data augmentation strategy to generalize convolutional neural networks. Previous empirical analysis has illustrated an orthogonal performance gain between MSR and conventional offline Knowledge Distillation (KD). To be more specific, student networks can be enhanced with the involvement of MSR in the training stage of sequential distillation. Yet, the interplay between MSR and online knowledge distillation, where an ensemble of peer students learn mutually from each other, remains unexplored. To bridge the gap, we make the first attempt at incorporating CutMix into online distillation, where we empirically observe a significant improvement. Encouraged by this fact, we propose an even stronger MSR specifically for online distillation, named as Cut\textsuperscript{n}Mix. Furthermore, a novel online distillation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and Data Classification
MethodsMixup · Knowledge Distillation · CutMix
