Dual-Foundation Models for Unsupervised Domain Adaptation

Yerin Cheon; Aruna Balasubramanian; Francois Rameau

arXiv:2605.03365·cs.CV·May 6, 2026

Dual-Foundation Models for Unsupervised Domain Adaptation

Yerin Cheon, Aruna Balasubramanian, Francois Rameau

PDF

TL;DR

This paper introduces a dual-foundation model approach for unsupervised domain adaptation in semantic segmentation, utilizing SAM and DINOv3 to improve learning from unlabeled real images.

Contribution

It proposes a novel framework combining SAM and DINOv3 to address pseudo-label reliance and prototype bias in UDA for semantic segmentation.

Findings

01

Achieves +1.3% mIoU on GTA-to-Cityscapes

02

Achieves +1.4% mIoU on SYNTHIA-to-Cityscapes

03

Outperforms strong UDA baselines

Abstract

Semantic segmentation provides pixel-level scene understanding essential for autonomous driving and fine-grained perception tasks. However, training segmentation models requires costly, labor-intensive annotations on real-world datasets. Unsupervised Domain Adaptation (UDA) addresses this by training models on labeled synthetic data and adapting them to unlabeled real images. While conceptually simple, adaptation is challenging due to the domain gap, i.e., differences in visual appearance and scene structure between synthetic and real data. Prior approaches bridge this gap through pixel-level mixing or feature-level contrastive learning. Yet, these techniques suffer from two major limitations: (1) reliance on high-confidence pseudo-labels restricts learning to a subset of the target domain, and (2) prototype-based contrastive methods initialize class prototypes from source-trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.