A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking
Zixiang Zhao, Haowen Bai, Bingxin Ke, Yukun Cui, Lilun Deng, Yulun Zhang, Kai Zhang, Konrad Schindler

TL;DR
This paper introduces UniVF, a unified framework for video fusion that incorporates multi-frame learning and optical flow to improve temporal consistency, supported by a new comprehensive benchmark called VF-Bench.
Contribution
It presents UniVF, a novel method for temporally coherent video fusion, and introduces VF-Bench, the first benchmark covering multiple video fusion tasks with a unified evaluation protocol.
Findings
UniVF achieves state-of-the-art results across all tasks on VF-Bench.
VF-Bench provides high-quality, well-aligned video pairs for comprehensive evaluation.
Extensive experiments validate the effectiveness of UniVF in improving video fusion quality.
Abstract
The real world is dynamic, yet most image fusion methods process static frames independently, ignoring temporal correlations in videos and leading to flickering and temporal inconsistency. To address this, we propose Unified Video Fusion (UniVF), a novel and unified framework for video fusion that leverages multi-frame learning and optical flow-based feature warping for informative, temporally coherent video fusion. To support its development, we also introduce Video Fusion Benchmark (VF-Bench), the first comprehensive benchmark covering four video fusion tasks: multi-exposure, multi-focus, infrared-visible, and medical fusion. VF-Bench provides high-quality, well-aligned video pairs obtained through synthetic data generation and rigorous curation from existing datasets, with a unified evaluation protocol that jointly assesses the spatial quality and temporal consistency of video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Vision and Imaging
