Multispectral State-Space Feature Fusion: Bridging Shared and Cross-Parametric Interactions for Object Detection
Jifeng Shen, Haibo Zhan, Shaohua Dong, Xin Zuo, Wankou Yang, Haibin Ling

TL;DR
This paper introduces MS2Fusion, a novel multispectral feature fusion framework based on state space models that effectively combines shared semantics and cross-modal interactions, improving object detection and perception tasks.
Contribution
The paper proposes a dual-path state-space model framework for multispectral feature fusion, enhancing generalization and scalability over existing methods.
Findings
MS2Fusion outperforms state-of-the-art multispectral object detection methods on benchmarks.
It achieves state-of-the-art results on RGB-T semantic segmentation.
The framework is applicable to various multispectral perception tasks.
Abstract
Modern multispectral feature fusion for object detection faces two critical limitations: (1) Excessive preference for local complementary features over cross-modal shared semantics adversely affects generalization performance; and (2) The trade-off between the receptive field size and computational complexity present critical bottlenecks for scalable feature modeling. Addressing these issues, a novel Multispectral State-Space Feature Fusion framework, dubbed MS2Fusion, is proposed based on the state space model (SSM), achieving efficient and effective fusion through a dual-path parametric interaction mechanism. More specifically, the first cross-parameter interaction branch inherits the advantage of cross-attention in mining complementary information with cross-modal hidden state decoding in SSM. The second shared-parameter branch explores cross-modal alignment with joint embedding to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Remote-Sensing Image Classification
