Textual Query-Driven Mask Transformer for Domain Generalized   Segmentation

Byeonghyun Pak; Byeongju Woo; Sunghwan Kim; Dae-hwan Kim; Hoseong Kim

arXiv:2407.09033·cs.CV·August 1, 2024

Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

Byeonghyun Pak, Byeongju Woo, Sunghwan Kim, Dae-hwan Kim, Hoseong Kim

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel transformer-based segmentation method that uses textual object queries derived from vision-language models to improve domain generalization in semantic segmentation tasks, especially under extreme domain shifts.

Contribution

The introduction of the textual query-driven mask transformer (tqdm), which leverages domain-invariant textual embeddings as object queries for improved generalization in semantic segmentation.

Findings

01

Achieves 68.9 mIoU on GTA5 to Cityscapes, surpassing previous methods by 2.5 mIoU.

02

Utilizes domain-invariant textual embeddings to enhance semantic understanding across domains.

03

Introduces regularization losses to align visual and textual features, improving model performance.

Abstract

In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. To leverage the power of textual object queries, we introduce a novel framework named the textual query-driven mask transformer (tqdm). Our tqdm aims to (1) generate textual object queries that maximally encode domain-invariant semantics and (2) enhance the semantic clarity of dense visual features. Additionally, we suggest three regularization losses to improve the efficacy of tqdm by aligning between visual and textual features. By utilizing our method, the model can comprehend inherent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ByeongHyunPak/tqdm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques