Bringing Multimodal Large Language Models to Infrared-Visible Image Fusion Quality Assessment
Yuchen Guo, Junli Gong, Yao Lu, Xintong Xu, Yiuming Cheung, Weifeng Su

TL;DR
This paper presents FuScore, a novel MLLM-based evaluation method for infrared-visible image fusion quality that produces continuous scores and aligns closely with human perception.
Contribution
It introduces FuScore, leveraging MLLMs for fine-grained, continuous quality assessment and a tripartite objective for improved correlation with human judgments.
Findings
FuScore achieves state-of-the-art correlation with human preferences.
It effectively discriminates among fused images of similar quality.
The method incorporates scene and method-level ordering for comprehensive evaluation.
Abstract
Infrared-Visible image fusion (IVIF) aims to integrate thermal information and detailed spatial structures into a single fused image to enhance perception. However, existing evaluation approaches tend to over-optimize both hand-crafted no-reference statistics and full-reference metrics that treat the source images as pseudo ground truths. Recent IVIF reward-modelling efforts learn from human ratings but use scalar regression on aggregated scores, neither leveraging the reasoning of Multimodal Large Language Models (MLLMs) nor encoding per-image perceptual ambiguity in their supervision, but naively introducing MLLMs with discrete one-hot supervision likewise collapses fused images of similar quality into different rating levels. To address this, we introduce FuScore, which utilizes an MLLM to mimic human visual perception by producing continuous quality score, rather than discrete level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
