SGDFuse: SAM-Guided Diffusion Model for High-Fidelity Infrared and Visible Image Fusion

Xiaoyang Zhang; jinjiang Li; Guodong Fan; Yakun Ju; Linwei Fan; Jun Liu; Alex C. Kot

arXiv:2508.05264·cs.CV·March 24, 2026

SGDFuse: SAM-Guided Diffusion Model for High-Fidelity Infrared and Visible Image Fusion

Xiaoyang Zhang, jinjiang Li, Guodong Fan, Yakun Ju, Linwei Fan, Jun Liu, Alex C. Kot

PDF

TL;DR

SGDFuse introduces a semantic-guided diffusion approach for infrared and visible image fusion, effectively preserving thermal targets and details by leveraging high-level semantic priors and a two-stage generative process.

Contribution

It presents a novel semantic-guided diffusion framework that reframes IVIF as a semantically-aware generative task, improving fusion quality and downstream performance.

Findings

01

Achieves state-of-the-art image quality in IVIF

02

Enhances downstream perception tasks

03

Effectively preserves thermal targets and details

Abstract

Infrared and visible image fusion (IVIF) is essential for integrating thermal saliency with textural details to support downstream perception. However, most existing approaches suffer from "semantic blindness," leading to the erroneous suppression of thermal targets and the introduction of visual artifacts. To address this, we propose SAM-Guided Diffusion Fusion Network (SGDFuse), a novel Semantic-Guided Generation (SGG) framework that reframes IVIF as a semantically-steered generative task rather than simplistic pixel mapping. Our method uniquely couples high-level semantic priors from the Segment Anything Model (SAM) with the high-fidelity generative power of a conditional diffusion model. We employ a deliberate two-stage strategy to decouple multimodal alignment from iterative refinement: Stage I establishes a robust structural foundation via preliminary fusion, while Stage II…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.