Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in   Open-world Environments

Meng Yu; Luojie Yang; Xunjie He; Yi Yang; Yufeng Yue

arXiv:2410.06626·cs.CV·October 10, 2024

Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments

Meng Yu, Luojie Yang, Xunjie He, Yi Yang, Yufeng Yue

PDF

Open Access

TL;DR

Open-RGBT introduces an open-vocabulary RGB-T semantic segmentation approach that leverages visual prompts and CLIP for improved scene understanding in diverse, real-world environments.

Contribution

The paper proposes a novel open-vocabulary RGB-T segmentation model using visual prompts and CLIP to enhance category recognition and semantic consistency.

Findings

01

Outperforms existing methods in challenging scenarios

02

Effectively handles heterogeneous RGB and thermal data

03

Achieves superior accuracy in real-world environments

Abstract

Semantic segmentation is a critical technique for effective scene understanding. Traditional RGB-T semantic segmentation models often struggle to generalize across diverse scenarios due to their reliance on pretrained models and predefined categories. Recent advancements in Visual Language Models (VLMs) have facilitated a shift from closed-set to open-vocabulary semantic segmentation methods. However, these models face challenges in dealing with intricate scenes, primarily due to the heterogeneity between RGB and thermal modalities. To address this gap, we present Open-RGBT, a novel open-vocabulary RGB-T semantic segmentation model. Specifically, we obtain instance-level detection proposals by incorporating visual prompts to enhance category understanding. Additionally, we employ the CLIP model to assess image-text similarity, which helps correct semantic consistency and mitigates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training