Human vs Objective Evaluation of Colourisation Performance
Se\'an Mullery, Paul F. Whelan

TL;DR
This paper evaluates how well objective metrics align with human judgments in image colourisation, revealing low correlation strength and highlighting the importance of hue accuracy for human perception.
Contribution
It introduces the Human Evaluated Colourisation Dataset (HECD) and analyzes the correlation between objective measures and human opinion in colourisation quality.
Findings
Low correlation between objective metrics and human opinion.
Humans are most sensitive to incorrect hues in natural objects.
Objective measures only partially predict human preferences.
Abstract
Automatic colourisation of grey-scale images is the process of creating a full-colour image from the grey-scale prior. It is an ill-posed problem, as there are many plausible colourisations for a given grey-scale prior. The current SOTA in auto-colourisation involves image-to-image type Deep Convolutional Neural Networks with Generative Adversarial Networks showing the greatest promise. The end goal of colourisation is to produce full colour images that appear plausible to the human viewer, but human assessment is costly and time consuming. This work assesses how well commonly used objective measures correlate with human opinion. We also attempt to determine what facets of colourisation have the most significant effect on human opinion. For each of 20 images from the BSD dataset, we create 65 recolourisations made up of local and global changes. Opinion scores are then crowd sourced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
