Delivering Arbitrary-Modal Semantic Segmentation
Jiaming Zhang, Ruiping Liu, Hao Shi, Kailun Yang, Simon Rei{\ss},, Kunyu Peng, Haodong Fu, Kaiwei Wang, Rainer Stiefelhagen

TL;DR
This paper introduces a new benchmark and a flexible model for semantic segmentation that effectively fuses an arbitrary number of modalities, improving robustness especially under challenging weather and sensor failure conditions.
Contribution
The paper presents the DeLiVER benchmark for arbitrary-modal segmentation and the CMNeXt model, enabling scalable fusion of multiple modalities with minimal additional parameters.
Findings
CMNeXt achieves state-of-the-art results on six benchmarks.
DeLiVER dataset includes severe weather and sensor failure scenarios.
Quad-modal CMNeXt improves mIoU by 9.10% over mono-modal baseline.
Abstract
Multimodal fusion can make semantic segmentation more robust. However, fusing an arbitrary number of modalities remains underexplored. To delve into this problem, we create the DeLiVER arbitrary-modal segmentation benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. Aside from this, we provide this dataset in four severe weather conditions as well as five sensor failure cases to exploit modal complementarity and resolve partial outages. To make this possible, we present the arbitrary cross-modal segmentation model CMNeXt. It encompasses a Self-Query Hub (SQ-Hub) designed to extract effective information from any modality for subsequent fusion with the RGB representation and adds only negligible amounts of parameters (~0.01M) per additional modality. On top, to efficiently and flexibly harvest discriminative cues from the auxiliary modalities, we introduce the simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
