Choosing Wisely and Learning Deeply: Selective Cross-Modality   Distillation via CLIP for Domain Generalization

Jixuan Leng; Yijiang Li; Haohan Wang

arXiv:2311.15145·cs.CV·April 24, 2024·1 cites

Choosing Wisely and Learning Deeply: Selective Cross-Modality Distillation via CLIP for Domain Generalization

Jixuan Leng, Yijiang Li, Haohan Wang

PDF

Open Access 1 Repo

TL;DR

This paper presents SCMD, a novel method leveraging CLIP for domain generalization by selectively distilling hard-to-learn samples, resulting in improved robustness across unseen domains.

Contribution

It introduces a unique sample selection framework and a cross-modality module that enhances domain generalization using large vision-language models.

Findings

01

SCMD achieves state-of-the-art performance on multiple benchmarks.

02

The selection strategy effectively identifies hard-to-learn samples.

03

Theoretical analysis supports the effectiveness of the selection method.

Abstract

Domain Generalization (DG), a crucial research area, seeks to train models across multiple domains and test them on unseen ones. In this paper, we introduce a novel approach, namely, Selective Cross-Modality Distillation for Domain Generalization (SCMD). SCMD leverages the capabilities of large vision-language models, specifically CLIP, to train a more efficient model, ensuring it acquires robust generalization capabilities across unseen domains. Our primary contribution is a unique selection framework strategically designed to identify hard-to-learn samples for distillation. In parallel, we introduce a novel cross-modality module that seamlessly combines the projected features of the student model with the text embeddings from CLIP, ensuring the alignment of similarity distributions. We assess SCMD's performance on various benchmarks, where it empowers a ResNet50 to deliver…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SeanLeng1/SCMD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Topic Modeling

MethodsContrastive Language-Image Pre-training