DIFF-MF: A Difference-Driven Channel-Spatial State Space Model for Multi-Modal Image Fusion
Yiming Sun, Zifan Ye, Qinghua Hu, Pengfei Zhu

TL;DR
DIFF-MF introduces a difference-driven state space model for multi-modal image fusion, effectively integrating features across channel and spatial dimensions to produce high-quality fused images with improved detail and thermal salience.
Contribution
The paper presents a novel difference-driven channel-spatial state space model that enhances multi-modal image fusion by leveraging feature discrepancy maps and cross-attention modules for adaptive, comprehensive fusion.
Findings
Outperforms existing methods in visual quality and quantitative metrics
Effectively captures global dependencies with linear complexity
Enhances thermal target salience while preserving visible details
Abstract
Multi-modal image fusion aims to integrate complementary information from multiple source images to produce high-quality fused images with enriched content. Although existing approaches based on state space model have achieved satisfied performance with high computational efficiency, they tend to either over-prioritize infrared intensity at the cost of visible details, or conversely, preserve visible structure while diminishing thermal target salience. To overcome these challenges, we propose DIFF-MF, a novel difference-driven channel-spatial state space model for multi-modal image fusion. Our approach leverages feature discrepancy maps between modalities to guide feature extraction, followed by a fusion process across both channel and spatial dimensions. In the channel dimension, a channel-exchange module enhances channel-wise interaction through cross-attention dual state space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Fusion Techniques · Image Enhancement Techniques · Visual Attention and Saliency Detection
