CtrlFuse: Mask-Prompt Guided Controllable Infrared and Visible Image Fusion

Yiming Sun; Yuan Ruan; Qinghua Hu; Pengfei Zhu

arXiv:2601.08619·cs.CV·January 14, 2026

CtrlFuse: Mask-Prompt Guided Controllable Infrared and Visible Image Fusion

Yiming Sun, Yuan Ruan, Qinghua Hu, Pengfei Zhu

PDF

Open Access 1 Video

TL;DR

CtrlFuse introduces a controllable infrared and visible image fusion method guided by mask prompts, enabling dynamic, task-specific fusion with improved segmentation accuracy and fusion quality.

Contribution

It presents a novel framework that integrates mask-guided prompts for interactive, controllable image fusion, surpassing existing methods in adaptability and performance.

Findings

01

Achieves state-of-the-art controllability in image fusion.

02

Outperforms existing methods in segmentation accuracy.

03

Enhances fusion quality through task-specific semantic guidance.

Abstract

Infrared and visible image fusion generates all-weather perception-capable images by combining complementary modalities, enhancing environmental awareness for intelligent unmanned systems. Existing methods either focus on pixel-level fusion while overlooking downstream task adaptability or implicitly learn rigid semantics through cascaded detection/segmentation models, unable to interactively address diverse semantic target perception needs. We propose CtrlFuse, a controllable image fusion framework that enables interactive dynamic fusion guided by mask prompts. The model integrates a multi-modal feature extractor, a reference prompt encoder (RPE), and a prompt-semantic fusion module (PSFM). The RPE dynamically encodes task-specific semantic prompts by fine-tuning pre-trained segmentation models with input mask guidance, while the PSFM explicitly injects these semantics into fusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CtrlFuse: Mask-Prompt Guided Controllable Infrared and Visible Image Fusion· underline

Taxonomy

TopicsAdvanced Image Fusion Techniques · Image Enhancement Techniques · Visual Attention and Saliency Detection