EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation
Shuhao Han, Haotian Fan, Jiachen Fu, Liang Li, Tao Li, Junhui Cui,, Yunqiu Wang, Yang Tai, Jingwei Sun, Chunle Guo, Chongyi Li

TL;DR
This paper introduces EvalMuse-40K, a comprehensive benchmark with human annotations for evaluating text-to-image models, along with two novel evaluation methods that improve fine-grained alignment assessment.
Contribution
The study provides a large, reliable dataset and two innovative evaluation techniques for more accurate assessment of image-text alignment in T2I models.
Findings
EvalMuse-40K contains 40K annotated image-text pairs.
Proposed methods outperform existing metrics in fine-grained evaluation.
Benchmark results help rank current AIGC models.
Abstract
Recently, Text-to-Image (T2I) generation models have achieved significant advancements. Correspondingly, many automated metrics have emerged to evaluate the image-text alignment capabilities of generative models. However, the performance comparison among these automated metrics is limited by existing small datasets. Additionally, these datasets lack the capacity to assess the performance of automated metrics at a fine-grained level. In this study, we contribute an EvalMuse-40K benchmark, gathering 40K image-text pairs with fine-grained human annotations for image-text alignment-related tasks. In the construction process, we employ various strategies such as balanced prompt sampling and data re-annotation to ensure the diversity and reliability of our benchmark. This allows us to comprehensively evaluate the effectiveness of image-text alignment metrics for T2I models. Meanwhile, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques · Topic Modeling
