Open-Vocabulary Domain Generalization in Urban-Scene Segmentation
Dong Zhao, Qi Zang, Nan Pu, Wenjing Li, Nicu Sebe, Zhun Zhong

TL;DR
This paper introduces a new benchmark and method for open-vocabulary domain generalization in urban-scene segmentation, addressing the challenge of recognizing unseen categories across diverse unseen environments.
Contribution
It proposes the first benchmark for OVDG-SS in autonomous driving and introduces S2-Corr, a novel correlation refinement mechanism to improve robustness across domains.
Findings
S2-Corr improves cross-domain segmentation accuracy.
The benchmark covers synthetic-to-real and real-to-real generalization.
The proposed method outperforms existing approaches in efficiency and accuracy.
Abstract
Domain Generalization in Semantic Segmentation (DG-SS) aims to enable segmentation models to perform robustly in unseen environments. However, conventional DG-SS methods are restricted to a fixed set of known categories, limiting their applicability in open-world scenarios. Recent progress in Vision-Language Models (VLMs) has advanced Open-Vocabulary Semantic Segmentation (OV-SS) by enabling models to recognize a broader range of concepts. Yet, these models remain sensitive to domain shifts and struggle to maintain robustness when deployed in unseen environments, a challenge that is particularly severe in urban-driving scenarios. To bridge this gap, we introduce Open-Vocabulary Domain Generalization in Semantic Segmentation (OVDG-SS), a new setting that jointly addresses unseen domains and unseen categories. We introduce the first benchmark for OVDG-SS in autonomous driving, addressing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
