EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive   Human Annotations for Text-to-Image Generation Model Evaluation

Shuhao Han; Haotian Fan; Jiachen Fu; Liang Li; Tao Li; Junhui Cui,; Yunqiu Wang; Yang Tai; Jingwei Sun; Chunle Guo; Chongyi Li

arXiv:2412.18150·cs.CV·December 30, 2024

EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation

Shuhao Han, Haotian Fan, Jiachen Fu, Liang Li, Tao Li, Junhui Cui,, Yunqiu Wang, Yang Tai, Jingwei Sun, Chunle Guo, Chongyi Li

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This paper introduces EvalMuse-40K, a comprehensive benchmark with human annotations for evaluating text-to-image models, along with two novel evaluation methods that improve fine-grained alignment assessment.

Contribution

The study provides a large, reliable dataset and two innovative evaluation techniques for more accurate assessment of image-text alignment in T2I models.

Findings

01

EvalMuse-40K contains 40K annotated image-text pairs.

02

Proposed methods outperform existing metrics in fine-grained evaluation.

03

Benchmark results help rank current AIGC models.

Abstract

Recently, Text-to-Image (T2I) generation models have achieved significant advancements. Correspondingly, many automated metrics have emerged to evaluate the image-text alignment capabilities of generative models. However, the performance comparison among these automated metrics is limited by existing small datasets. Additionally, these datasets lack the capacity to assess the performance of automated metrics at a fine-grained level. In this study, we contribute an EvalMuse-40K benchmark, gathering 40K image-text pairs with fine-grained human annotations for image-text alignment-related tasks. In the construction process, we employ various strategies such as balanced prompt sampling and data re-annotation to ensure the diversity and reliability of our benchmark. This allows us to comprehensively evaluate the effectiveness of image-text alignment metrics for T2I models. Meanwhile, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DYEvaLab/EvalMuse
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques · Topic Modeling