UniFusion: A Unified Image Fusion Framework with Robust Representation and Source-Aware Preservation
Xingyuan Li, Songcheng Du, Yang Zou, HaoYuan Xu, Zhiying Jiang, Jinyuan Liu

TL;DR
UniFusion is a versatile image fusion framework that uses shared semantic features and a novel optimization strategy to effectively combine information from various sources while preserving source details.
Contribution
It introduces a unified architecture with cross-task generalization, leveraging DINOv3 features and a bilevel optimization approach for improved fusion performance.
Findings
Outperforms existing methods in multiple fusion tasks
Demonstrates strong generalization to real-world scenarios
Achieves superior visual quality and source preservation
Abstract
Image fusion aims to integrate complementary information from multiple source images to produce a more informative and visually consistent representation, benefiting both human perception and downstream vision tasks. Despite recent progress, most existing fusion methods are designed for specific tasks (i.e., multi-modal, multi-exposure, or multi-focus fusion) and struggle to effectively preserve source information during the fusion process. This limitation primarily arises from task-specific architectures and the degradation of source information caused by deep-layer propagation. To overcome these issues, we propose UniFusion, a unified image fusion framework designed to achieve cross-task generalization. First, leveraging DINOv3 for modality-consistent feature extraction, UniFusion establishes a shared semantic space for diverse inputs. Second, to preserve the understanding of each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Fusion Techniques · Image Enhancement Techniques · Visual Attention and Saliency Detection
