Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification
Yunyi Xuan, Weijie Chen, Shicai Yang, Di Xie, Luojun Lin, Yueting, Zhuang

TL;DR
This paper introduces a data-free distillation method for vision-language models that uses prompt diversification to synthesize diverse surrogate images, enhancing out-of-distribution generalization without relying on large datasets.
Contribution
It proposes three novel prompt diversification techniques—Mix-Prompt, Random-Prompt, and Contrastive-Prompt—for synthesizing diverse images to improve distribution-agnostic model adaptation.
Findings
Contrastive-Prompt achieves the best out-of-distribution generalization.
The methods effectively synthesize diverse surrogate images.
Improved robustness of student models in OOD scenarios.
Abstract
Data-Free Knowledge Distillation (DFKD) has shown great potential in creating a compact student model while alleviating the dependency on real training data by synthesizing surrogate data. However, prior arts are seldom discussed under distribution shifts, which may be vulnerable in real-world applications. Recent Vision-Language Foundation Models, e.g., CLIP, have demonstrated remarkable performance in zero-shot out-of-distribution generalization, yet consuming heavy computation resources. In this paper, we discuss the extension of DFKD to Vision-Language Foundation Models without access to the billion-level image-text datasets. The objective is to customize a student model for distribution-agnostic downstream tasks with given category concepts, inheriting the out-of-distribution generalization capability from the pre-trained foundation models. In order to avoid generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Semantic Web and Ontologies
MethodsContrastive Language-Image Pre-training · Knowledge Distillation
