TL;DR
This paper critiques current evaluation metrics for single-image HDR reconstruction, revealing that high metric scores do not always correlate with perceived image quality and proposing the need for standardized assessment protocols.
Contribution
It highlights the biases in current evaluation methods and demonstrates that non-HDR methods can outperform deep learning approaches in metrics, emphasizing the need for better evaluation standards.
Findings
Objective metrics can be biased by certain reconstruction aspects.
Non-HDR methods can achieve high scores comparable to deep learning methods.
Standardized evaluation protocols are necessary for fair comparison.
Abstract
Single-image high dynamic range (SI-HDR) reconstruction has recently emerged as a problem well-suited for deep learning methods. Each successive technique demonstrates an improvement over existing methods by reporting higher image quality scores. This paper, however, highlights that such improvements in objective metrics do not necessarily translate to visually superior images. The first problem is the use of disparate evaluation conditions in terms of data and metric parameters, calling for a standardized protocol to make it possible to compare between papers. The second problem, which forms the main focus of this paper, is the inherent difficulty in evaluating SI-HDR reconstructions since certain aspects of the reconstruction problem dominate objective differences, thereby introducing a bias. Here, we reproduce a typical evaluation using existing as well as simulated SI-HDR methods to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
