DIFF-MF: A Difference-Driven Channel-Spatial State Space Model for Multi-Modal Image Fusion

Yiming Sun; Zifan Ye; Qinghua Hu; Pengfei Zhu

arXiv:2601.05538·cs.CV·January 12, 2026

DIFF-MF: A Difference-Driven Channel-Spatial State Space Model for Multi-Modal Image Fusion

Yiming Sun, Zifan Ye, Qinghua Hu, Pengfei Zhu

PDF

Open Access

TL;DR

DIFF-MF introduces a difference-driven state space model for multi-modal image fusion, effectively integrating features across channel and spatial dimensions to produce high-quality fused images with improved detail and thermal salience.

Contribution

The paper presents a novel difference-driven channel-spatial state space model that enhances multi-modal image fusion by leveraging feature discrepancy maps and cross-attention modules for adaptive, comprehensive fusion.

Findings

01

Outperforms existing methods in visual quality and quantitative metrics

02

Effectively captures global dependencies with linear complexity

03

Enhances thermal target salience while preserving visible details

Abstract

Multi-modal image fusion aims to integrate complementary information from multiple source images to produce high-quality fused images with enriched content. Although existing approaches based on state space model have achieved satisfied performance with high computational efficiency, they tend to either over-prioritize infrared intensity at the cost of visible details, or conversely, preserve visible structure while diminishing thermal target salience. To overcome these challenges, we propose DIFF-MF, a novel difference-driven channel-spatial state space model for multi-modal image fusion. Our approach leverages feature discrepancy maps between modalities to guide feature extraction, followed by a fusion process across both channel and spatial dimensions. In the channel dimension, a channel-exchange module enhances channel-wise interaction through cross-attention dual state space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Fusion Techniques · Image Enhancement Techniques · Visual Attention and Saliency Detection