Multi-Modal Image Fusion via Intervention-Stable Feature Learning

Xue Wang; Zheng Guan; Wenhua Qian; Chengchao Wang; Runzhuo Ma

arXiv:2603.23272·cs.CV·March 25, 2026

Multi-Modal Image Fusion via Intervention-Stable Feature Learning

Xue Wang, Zheng Guan, Wenhua Qian, Chengchao Wang, Runzhuo Ma

PDF

Open Access

TL;DR

This paper introduces an intervention-based causal framework for multi-modal image fusion, focusing on learning robust features that are stable under various perturbations to improve performance and generalization.

Contribution

It proposes a novel causal intervention strategy and a Causal Feature Integrator to identify and prioritize stable, informative features across modalities, enhancing robustness over existing methods.

Findings

01

Achieves state-of-the-art results on benchmark datasets

02

Demonstrates robustness under distribution shifts

03

Improves high-level vision task performance

Abstract

Multi-modal image fusion integrates complementary information from different modalities into a unified representation. Current methods predominantly optimize statistical correlations between modalities, often capturing dataset-induced spurious associations that degrade under distribution shifts. In this paper, we propose an intervention-based framework inspired by causal principles to identify robust cross-modal dependencies. Drawing insights from Pearl's causal hierarchy, we design three principled intervention strategies to probe different aspects of modal relationships: i) complementary masking with spatially disjoint perturbations tests whether modalities can genuinely compensate for each other's missing information, ii) random masking of identical regions identifies feature subsets that remain informative under partial observability, and iii) modality dropout evaluates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Fusion Techniques · Image Enhancement Techniques · Visual Attention and Saliency Detection