Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
Andy Zhou, Jindong Wang, Yu-Xiong Wang, Haohan Wang

TL;DR
This paper introduces a lightweight framework combining knowledge distillation and data augmentation to significantly enhance out-of-distribution robustness in vision models, leveraging robust foundation models as teachers.
Contribution
It demonstrates that large pretrained models serve as effective teachers for robustness and proposes Discrete Adversarial Distillation (DAD) using VQGAN for improved data augmentation.
Findings
Strong out-of-distribution robustness gains
Improved clean accuracy across architectures
Minor computational overhead
Abstract
We propose a conceptually simple and lightweight framework for improving the robustness of vision models through the combination of knowledge distillation and data augmentation. We address the conjecture that larger models do not make for better teachers by showing strong gains in out-of-distribution robustness when distilling from pretrained foundation models. Following this finding, we propose Discrete Adversarial Distillation (DAD), which leverages a robust teacher to generate adversarial examples and a VQGAN to discretize them, creating more informative samples than standard data augmentation techniques. We provide a theoretical framework for the use of a robust teacher in the knowledge distillation with data augmentation setting and demonstrate strong gains in out-of-distribution robustness and clean accuracy across different student architectures. Notably, our method adds minor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsKnowledge Distillation
