CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddings
Cristina Mata, Kanchana Ranasinghe, Michael S. Ryoo

TL;DR
This paper introduces CoPT, a novel unsupervised domain adaptation method for segmentation that leverages domain-agnostic text embeddings to learn invariant features, achieving state-of-the-art results across multiple benchmarks.
Contribution
The paper proposes a new Covariance-based Pixel-Text loss using domain-agnostic text embeddings and a novel LLM Domain Template process for improved segmentation adaptation.
Findings
Achieves state-of-the-art performance on four segmentation benchmarks.
Demonstrates effectiveness of domain-agnostic text embeddings in UDA.
Outperforms previous methods in unsupervised domain adaptation for segmentation.
Abstract
Unsupervised domain adaptation (UDA) involves learning class semantics from labeled data within a source domain that generalize to an unseen target domain. UDA methods are particularly impactful for semantic segmentation, where annotations are more difficult to collect than in image classification. Despite recent advances in large-scale vision-language representation learning, UDA methods for segmentation have not taken advantage of the domain-agnostic properties of text. To address this, we present a novel Covariance-based Pixel-Text loss, CoPT, that uses domain-agnostic text embeddings to learn domain-invariant features in an image segmentation encoder. The text embeddings are generated through our LLM Domain Template process, where an LLM is used to generate source and target domain descriptions that are fed to a frozen CLIP model and combined. In experiments on four benchmarks we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications
