Distilling Vision-Language Foundation Models: A Data-Free Approach via   Prompt Diversification

Yunyi Xuan; Weijie Chen; Shicai Yang; Di Xie; Luojun Lin; Yueting; Zhuang

arXiv:2407.15155·cs.CV·July 23, 2024

Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification

Yunyi Xuan, Weijie Chen, Shicai Yang, Di Xie, Luojun Lin, Yueting, Zhuang

PDF

Open Access

TL;DR

This paper introduces a data-free distillation method for vision-language models that uses prompt diversification to synthesize diverse surrogate images, enhancing out-of-distribution generalization without relying on large datasets.

Contribution

It proposes three novel prompt diversification techniques—Mix-Prompt, Random-Prompt, and Contrastive-Prompt—for synthesizing diverse images to improve distribution-agnostic model adaptation.

Findings

01

Contrastive-Prompt achieves the best out-of-distribution generalization.

02

The methods effectively synthesize diverse surrogate images.

03

Improved robustness of student models in OOD scenarios.

Abstract

Data-Free Knowledge Distillation (DFKD) has shown great potential in creating a compact student model while alleviating the dependency on real training data by synthesizing surrogate data. However, prior arts are seldom discussed under distribution shifts, which may be vulnerable in real-world applications. Recent Vision-Language Foundation Models, e.g., CLIP, have demonstrated remarkable performance in zero-shot out-of-distribution generalization, yet consuming heavy computation resources. In this paper, we discuss the extension of DFKD to Vision-Language Foundation Models without access to the billion-level image-text datasets. The objective is to customize a student model for distribution-agnostic downstream tasks with given category concepts, inheriting the out-of-distribution generalization capability from the pre-trained foundation models. In order to avoid generalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Semantic Web and Ontologies

MethodsContrastive Language-Image Pre-training · Knowledge Distillation