Improving Knowledge Distillation Under Unknown Covariate Shift Through Confidence-Guided Data Augmentation
Niclas Popp, Kevin Alexander Laube, Matthias Hein, Lukas Schott

TL;DR
This paper proposes a diffusion-based data augmentation method that enhances knowledge distillation robustness against unknown covariate shifts by generating challenging samples that improve worst-case and average accuracy.
Contribution
It introduces a novel diffusion-based augmentation strategy that maximizes teacher-student disagreement to improve robustness under covariate shift in knowledge distillation.
Findings
Significant improvement in worst group accuracy on CelebA and SpuCo Birds.
Enhanced mean group accuracy and spurious mAUC on ImageNet under covariate shift.
Outperforms existing diffusion-based augmentation methods.
Abstract
Large foundation models trained on extensive datasets demonstrate strong zero-shot capabilities in various domains. To replicate their success when data and model size are constrained, knowledge distillation has become an established tool for transferring knowledge from foundation models to small student networks. However, the effectiveness of distillation is critically limited by the available training data. This work addresses the common practical issue of covariate shift in knowledge distillation, where spurious features appear during training but not at test time. We ask the question: when these spurious features are unknown, yet a robust teacher is available, is it possible for a student to also become robust to them? We address this problem by introducing a novel diffusion-based data augmentation strategy that generates images by maximizing the disagreement between the teacher and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
