DM-FNet: Unified multimodal medical image fusion via diffusion process-trained encoder-decoder
Dan He, Weisheng Li, Guofen Wang, Yuping Huang, and Shiqiang Liu

TL;DR
This paper introduces DM-FNet, a diffusion process-trained encoder-decoder network for multimodal medical image fusion, improving detail capture and feature interaction for higher quality fused images.
Contribution
The study presents a novel two-stage diffusion model-based fusion network that enhances feature extraction and fusion quality in multimodal medical imaging.
Findings
Outperforms existing methods on objective metrics
Preserves brightness, contrast, and detail effectively
Demonstrates robustness across various medical image types
Abstract
Multimodal medical image fusion (MMIF) extracts the most meaningful information from multiple source images, enabling a more comprehensive and accurate diagnosis. Achieving high-quality fusion results requires a careful balance of brightness, color, contrast, and detail; this ensures that the fused images effectively display relevant anatomical structures and reflect the functional status of the tissues. However, existing MMIF methods have limited capacity to capture detailed features during conventional training and suffer from insufficient cross-modal feature interaction, leading to suboptimal fused image quality. To address these issues, this study proposes a two-stage diffusion model-based fusion network (DM-FNet) to achieve unified MMIF. In Stage I, a diffusion process trains UNet for image reconstruction. UNet captures detailed information through progressive denoising and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Fusion Techniques
