Cross-modal State Space Modeling for Real-time RGB-thermal Wild Scene Semantic Segmentation

Xiaodong Guo; Zi'ang Lin; Luwen Hu; Zhihong Deng; Tong Liu; and Wujie Zhou

arXiv:2506.17869·cs.CV·June 24, 2025

Cross-modal State Space Modeling for Real-time RGB-thermal Wild Scene Semantic Segmentation

Xiaodong Guo, Zi'ang Lin, Luwen Hu, Zhihong Deng, Tong Liu, and Wujie Zhou

PDF

1 Repo

TL;DR

This paper introduces CM-SSM, an efficient cross-modal state space model for real-time RGB-thermal semantic segmentation in wild environments, achieving high accuracy with lower computational cost than Transformer-based methods.

Contribution

The paper proposes a novel cross-modal state space modeling approach that reduces computational complexity and improves segmentation performance in resource-constrained settings.

Findings

01

Achieves state-of-the-art results on CART dataset

02

Uses fewer parameters and lower computational cost

03

Demonstrates good generalizability on PST900 dataset

Abstract

The integration of RGB and thermal data can significantly improve semantic segmentation performance in wild environments for field robots. Nevertheless, multi-source data processing (e.g. Transformer-based approaches) imposes significant computational overhead, presenting challenges for resource-constrained systems. To resolve this critical limitation, we introduced CM-SSM, an efficient RGB-thermal semantic segmentation architecture leveraging a cross-modal state space modeling (SSM) approach. Our framework comprises two key components. First, we introduced a cross-modal 2D-selective-scan (CM-SS2D) module to establish SSM between RGB and thermal modalities, which constructs cross-modal visual sequences and derives hidden state representations of one modality from the other. Second, we developed a cross-modal state space association (CM-SSA) module that effectively integrates global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaodonguo/cmssm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.