Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao,, Hongsheng Li

TL;DR
This paper introduces a large-scale human preference dataset and a new scoring model, HPS v2, to improve the evaluation of text-to-image generative models by better aligning with human judgments.
Contribution
The paper presents HPD v2, the largest dataset of human preferences for images, and fine-tunes CLIP to create HPS v2, a more accurate metric for evaluating text-to-image models.
Findings
HPS v2 outperforms previous metrics in predicting human preferences.
HPS v2 generalizes well across various image distributions.
The benchmark facilitates fair comparison of recent text-to-image models.
Abstract
Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated images cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference Dataset v2 (HPD v2), a large-scale dataset that captures human preferences on images from a wide range of sources. HPD v2 comprises 798,090 human preference choices on 433,760 pairs of images, making it the largest dataset of its kind. The text prompts and images are deliberately collected to eliminate potential bias, which is a common issue in previous datasets. By fine-tuning CLIP on HPD v2, we obtain Human Preference Score v2 (HPS v2), a scoring model that can more accurately predict human preferences on generated images. Our experiments demonstrate that HPS v2 generalizes better than previous metrics across various image distributions and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Aesthetic Perception and Analysis · Image Retrieval and Classification Techniques
MethodsContrastive Language-Image Pre-training
