A Simple Recipe for Language-guided Domain Generalized Segmentation

Mohammad Fahes; Tuan-Hung Vu; Andrei Bursuc; Patrick P\'erez; Raoul de; Charette

arXiv:2311.17922·cs.CV·April 3, 2024·1 cites

A Simple Recipe for Language-guided Domain Generalized Segmentation

Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Patrick P\'erez, Raoul de, Charette

PDF

Open Access 1 Repo

TL;DR

This paper proposes a straightforward method for improving semantic segmentation generalization to new domains by leveraging language-guided style augmentation and minimal fine-tuning of CLIP, achieving state-of-the-art results.

Contribution

It introduces a simple, effective framework that uses language as a source of randomization for domain generalization in segmentation tasks, with minimal fine-tuning of CLIP.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Effective use of language-guided style augmentation.

03

Minimal fine-tuning preserves CLIP robustness.

Abstract

Generalization to new domains not seen during training is one of the long-standing challenges in deploying neural networks in real-world applications. Existing generalization techniques either necessitate external images for augmentation, and/or aim at learning invariant representations by imposing various alignment constraints. Large-scale pretraining has recently shown promising generalization capabilities, along with the potential of binding different modalities. For instance, the advent of vision-language models like CLIP has opened the doorway for vision models to exploit the textual modality. In this paper, we introduce a simple framework for generalizing semantic segmentation networks by employing language as the source of randomization. Our recipe comprises three key ingredients: (i) the preservation of the intrinsic CLIP robustness through minimal fine-tuning, (ii)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

astra-vision/FAMix
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

MethodsContrastive Language-Image Pre-training