TL;DR
This paper introduces CM-SSM, an efficient cross-modal state space model for real-time RGB-thermal semantic segmentation in wild environments, achieving high accuracy with lower computational cost than Transformer-based methods.
Contribution
The paper proposes a novel cross-modal state space modeling approach that reduces computational complexity and improves segmentation performance in resource-constrained settings.
Findings
Achieves state-of-the-art results on CART dataset
Uses fewer parameters and lower computational cost
Demonstrates good generalizability on PST900 dataset
Abstract
The integration of RGB and thermal data can significantly improve semantic segmentation performance in wild environments for field robots. Nevertheless, multi-source data processing (e.g. Transformer-based approaches) imposes significant computational overhead, presenting challenges for resource-constrained systems. To resolve this critical limitation, we introduced CM-SSM, an efficient RGB-thermal semantic segmentation architecture leveraging a cross-modal state space modeling (SSM) approach. Our framework comprises two key components. First, we introduced a cross-modal 2D-selective-scan (CM-SS2D) module to establish SSM between RGB and thermal modalities, which constructs cross-modal visual sequences and derives hidden state representations of one modality from the other. Second, we developed a cross-modal state space association (CM-SSA) module that effectively integrates global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
