Human Preference Score v2: A Solid Benchmark for Evaluating Human   Preferences of Text-to-Image Synthesis

Xiaoshi Wu; Yiming Hao; Keqiang Sun; Yixiong Chen; Feng Zhu; Rui Zhao,; Hongsheng Li

arXiv:2306.09341·cs.CV·September 26, 2023·22 cites

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao,, Hongsheng Li

PDF

Open Access 1 Repo 1 Models 5 Datasets

TL;DR

This paper introduces a large-scale human preference dataset and a new scoring model, HPS v2, to improve the evaluation of text-to-image generative models by better aligning with human judgments.

Contribution

The paper presents HPD v2, the largest dataset of human preferences for images, and fine-tunes CLIP to create HPS v2, a more accurate metric for evaluating text-to-image models.

Findings

01

HPS v2 outperforms previous metrics in predicting human preferences.

02

HPS v2 generalizes well across various image distributions.

03

The benchmark facilitates fair comparison of recent text-to-image models.

Abstract

Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated images cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference Dataset v2 (HPD v2), a large-scale dataset that captures human preferences on images from a wide range of sources. HPD v2 comprises 798,090 human preference choices on 433,760 pairs of images, making it the largest dataset of its kind. The text prompts and images are deliberately collected to eliminate potential bias, which is a common issue in previous datasets. By fine-tuning CLIP on HPD v2, we obtain Human Preference Score v2 (HPS v2), a scoring model that can more accurately predict human preferences on generated images. Our experiments demonstrate that HPS v2 generalizes better than previous metrics across various image distributions and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tgxs002/hpsv2
pytorchOfficial

Models

🤗
nvidia/finite-difference-flow-optimization
model

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Aesthetic Perception and Analysis · Image Retrieval and Classification Techniques

MethodsContrastive Language-Image Pre-training