Fusion in Your Way: Aligning Image Fusion with Heterogeneous Demands via Direct Preference Optimization

Weijian Su; Songqian Zhang; Yuqi Han; Jian Zhuang; Yongdong Huang; Qiang Zhang

arXiv:2605.06049·cs.CV·May 8, 2026

Fusion in Your Way: Aligning Image Fusion with Heterogeneous Demands via Direct Preference Optimization

Weijian Su, Songqian Zhang, Yuqi Han, Jian Zhuang, Yongdong Huang, Qiang Zhang

PDF

TL;DR

This paper introduces DPOFusion, a novel framework for adaptive infrared and visible image fusion that aligns with diverse human and machine preferences using direct preference optimization and latent diffusion models.

Contribution

The paper presents a new DPOFusion framework combining property-aligned and preference-controllable latent diffusion models for flexible, task-guided image fusion.

Findings

01

Achieves precise preference alignment among humans, vision-language models, and task networks.

02

Sets a new benchmark for adaptive fusion quality and task transferability.

03

Demonstrates superior performance in preference-adaptive image fusion tasks.

Abstract

As a key technique in multi-modal processing, infrared and visible image fusion (IVIF) plays a crucial role in integrating complementary spectral information for visual enhancement and downstream vision tasks. Despite remarkable progress, existing methods struggle to flexibly accommodate heterogeneous demands. Achieving adaptive fusion that aligns with various preferences from both human and machine vision remains an open and challenging problem. To address this challenge, we propose DPOFusion, a direct preference optimization (DPO) framework integrating the property-aligned latent diffusion model (PALDM) and the preference-controllable latent diffusion model (PCLDM), enabling task-guided, preference-adaptive IVIF for both human and machine vision. The PALDM leverages a latent fusion prior and a joint conditional loss to generate diverse candidate fusion results with various properties.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.