Multi-Modal Image Fusion via Intervention-Stable Feature Learning
Xue Wang, Zheng Guan, Wenhua Qian, Chengchao Wang, Runzhuo Ma

TL;DR
This paper introduces an intervention-based causal framework for multi-modal image fusion, focusing on learning robust features that are stable under various perturbations to improve performance and generalization.
Contribution
It proposes a novel causal intervention strategy and a Causal Feature Integrator to identify and prioritize stable, informative features across modalities, enhancing robustness over existing methods.
Findings
Achieves state-of-the-art results on benchmark datasets
Demonstrates robustness under distribution shifts
Improves high-level vision task performance
Abstract
Multi-modal image fusion integrates complementary information from different modalities into a unified representation. Current methods predominantly optimize statistical correlations between modalities, often capturing dataset-induced spurious associations that degrade under distribution shifts. In this paper, we propose an intervention-based framework inspired by causal principles to identify robust cross-modal dependencies. Drawing insights from Pearl's causal hierarchy, we design three principled intervention strategies to probe different aspects of modal relationships: i) complementary masking with spatially disjoint perturbations tests whether modalities can genuinely compensate for each other's missing information, ii) random masking of identical regions identifies feature subsets that remain informative under partial observability, and iii) modality dropout evaluates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Fusion Techniques · Image Enhancement Techniques · Visual Attention and Saliency Detection
