SegRGB-X: General RGB-X Semantic Segmentation Model

Jiong Liu; Yingjie Xu; Xingcheng Zhou; Rui Song; Walter Zimmer; Alois Knoll; Hu Cao

arXiv:2603.28023·cs.CV·March 31, 2026

SegRGB-X: General RGB-X Semantic Segmentation Model

Jiong Liu, Yingjie Xu, Xingcheng Zhou, Rui Song, Walter Zimmer, Alois Knoll, Hu Cao

PDF

TL;DR

SegRGB-X introduces a universal framework for RGB-X semantic segmentation, unifying multiple modalities with innovative modules, achieving state-of-the-art results across diverse datasets.

Contribution

The paper presents a novel universal segmentation framework with modality-aware guidance, aligned embeddings, and dynamic feature refinement, enabling effective multi-modal segmentation.

Findings

01

Achieves a mIoU of 65.03% on diverse datasets.

02

Outperforms specialized multi-modal methods.

03

Demonstrates effectiveness across five different sensor modalities.

Abstract

Semantic segmentation across arbitrary sensor modalities faces significant challenges due to diverse sensor characteristics, and the traditional configurations for this task result in redundant development efforts. We address these challenges by introducing a universal arbitrary-modal semantic segmentation framework that unifies segmentation across multiple modalities. Our approach features three key innovations: (1) the Modality-aware CLIP (MA-CLIP), which provides modality-specific scene understanding guidance through LoRA fine-tuning; (2) Modality-aligned Embeddings for capturing fine-grained features; and (3) the Domain-specific Refinement Module (DSRM) for dynamic feature adjustment. Evaluated on five diverse datasets with different complementary modalities (event, thermal, depth, polarization, and light field), our model surpasses specialized multi-modal methods and achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.