A Sentence Speaks a Thousand Images: Domain Generalization through   Distilling CLIP with Language Guidance

Zeyi Huang; Andy Zhou; Zijian Lin; Mu Cai; Haohan Wang; Yong Jae Lee

arXiv:2309.12530·cs.CV·September 25, 2023·2 cites

A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance

Zeyi Huang, Andy Zhou, Zijian Lin, Mu Cai, Haohan Wang, Yong Jae Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces RISE, a novel domain generalization method that distills knowledge from CLIP using text guidance to improve model robustness on unseen domains.

Contribution

It presents a new regularization technique leveraging CLIP's text and image representations for domain generalization, a first in using large vision-language models for this purpose.

Findings

01

RISE outperforms state-of-the-art methods on benchmark datasets.

02

Text-guided regularization enhances model generalization.

03

First application of vision-language knowledge distillation for domain generalization.

Abstract

Domain generalization studies the problem of training a model with samples from several domains (or distributions) and then testing the model with samples from a new, unseen domain. In this paper, we propose a novel approach for domain generalization that leverages recent advances in large vision-language models, specifically a CLIP teacher model, to train a smaller model that generalizes to unseen domains. The key technical contribution is a new type of regularization that requires the student's learned image representations to be close to the teacher's learned text representations obtained from encoding the corresponding text descriptions of images. We introduce two designs of the loss function, absolute and relative distance, which provide specific guidance on how the training process of the student model should be regularized. We evaluate our proposed method, dubbed RISE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oodbag/rise
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsKnowledge Distillation · Contrastive Language-Image Pre-training