Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
Byeonghyun Pak, Byeongju Woo, Sunghwan Kim, Dae-hwan Kim, Hoseong Kim

TL;DR
This paper presents a novel transformer-based segmentation method that uses textual object queries derived from vision-language models to improve domain generalization in semantic segmentation tasks, especially under extreme domain shifts.
Contribution
The introduction of the textual query-driven mask transformer (tqdm), which leverages domain-invariant textual embeddings as object queries for improved generalization in semantic segmentation.
Findings
Achieves 68.9 mIoU on GTA5 to Cityscapes, surpassing previous methods by 2.5 mIoU.
Utilizes domain-invariant textual embeddings to enhance semantic understanding across domains.
Introduces regularization losses to align visual and textual features, improving model performance.
Abstract
In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. To leverage the power of textual object queries, we introduce a novel framework named the textual query-driven mask transformer (tqdm). Our tqdm aims to (1) generate textual object queries that maximally encode domain-invariant semantics and (2) enhance the semantic clarity of dense visual features. Additionally, we suggest three regularization losses to improve the efficacy of tqdm by aligning between visual and textual features. By utilizing our method, the model can comprehend inherent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques
