SGDFuse: SAM-Guided Diffusion Model for High-Fidelity Infrared and Visible Image Fusion
Xiaoyang Zhang, jinjiang Li, Guodong Fan, Yakun Ju, Linwei Fan, Jun Liu, Alex C. Kot

TL;DR
SGDFuse introduces a semantic-guided diffusion approach for infrared and visible image fusion, effectively preserving thermal targets and details by leveraging high-level semantic priors and a two-stage generative process.
Contribution
It presents a novel semantic-guided diffusion framework that reframes IVIF as a semantically-aware generative task, improving fusion quality and downstream performance.
Findings
Achieves state-of-the-art image quality in IVIF
Enhances downstream perception tasks
Effectively preserves thermal targets and details
Abstract
Infrared and visible image fusion (IVIF) is essential for integrating thermal saliency with textural details to support downstream perception. However, most existing approaches suffer from "semantic blindness," leading to the erroneous suppression of thermal targets and the introduction of visual artifacts. To address this, we propose SAM-Guided Diffusion Fusion Network (SGDFuse), a novel Semantic-Guided Generation (SGG) framework that reframes IVIF as a semantically-steered generative task rather than simplistic pixel mapping. Our method uniquely couples high-level semantic priors from the Segment Anything Model (SAM) with the high-fidelity generative power of a conditional diffusion model. We employ a deliberate two-stage strategy to decouple multimodal alignment from iterative refinement: Stage I establishes a robust structural foundation via preliminary fusion, while Stage II…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
