Domain Adaptation with a Single Vision-Language Embedding

Mohammad Fahes; Tuan-Hung Vu; Andrei Bursuc; Patrick P\'erez; Raoul de; Charette

arXiv:2410.21361·cs.CV·October 30, 2024

Domain Adaptation with a Single Vision-Language Embedding

Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Patrick P\'erez, Raoul de, Charette

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel domain adaptation framework using a single vision-language embedding, enabling effective zero-shot and one-shot adaptation without target data, leveraging contrastive pre-training and style augmentation.

Contribution

The work proposes a new method for domain adaptation that relies on a single VL embedding and a style augmentation technique called PIN, eliminating the need for full target data during training.

Findings

01

Outperforms relevant baselines in semantic segmentation tasks.

02

Effective in zero-shot and one-shot domain adaptation scenarios.

03

Utilizes a single VL embedding for style augmentation and domain adaptation.

Abstract

Domain adaptation has been extensively investigated in computer vision but still requires access to target data at the training time, which might be difficult to obtain in some uncommon conditions. In this paper, we present a new framework for domain adaptation relying on a single Vision-Language (VL) latent embedding instead of full target data. First, leveraging a contrastive language-image pre-training model (CLIP), we propose prompt/photo-driven instance normalization (PIN). PIN is a feature augmentation method that mines multiple visual styles using a single target VL latent embedding, by optimizing affine transformations of low-level source features. The VL embedding can come from a language prompt describing the target domain, a partially optimized language prompt, or a single unlabeled target image. Second, we show that these mined styles (i.e., augmentations) can be used for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

astra-vision/poda
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Cancer-related molecular mechanisms research

MethodsInstance Normalization