LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations
Zhichao Yang, Tianjiao Gu, Jianjie Wang, Feiyu Lin, Xiangfei Sheng, Pengfei Chen, Leida Li

TL;DR
LongT2IBench introduces a comprehensive benchmark with graph-structured annotations for evaluating long text-to-image generation, enabling more interpretable and fine-grained alignment assessments.
Contribution
This paper presents LongT2IBench, a novel benchmark with detailed graph annotations and a new evaluation model, LongT2IExpert, for improved assessment of long T2I models.
Findings
LongT2IExpert outperforms existing evaluators in alignment accuracy.
Graph-structured annotations enhance interpretability of evaluation results.
The benchmark and model facilitate better development of long T2I systems.
Abstract
The increasing popularity of long Text-to-Image (T2I) generation has created an urgent need for automatic and interpretable models that can evaluate the image-text alignment in long prompt scenarios. However, the existing T2I alignment benchmarks predominantly focus on short prompt scenarios and only provide MOS or Likert scale annotations. This inherent limitation hinders the development of long T2I evaluators, particularly in terms of the interpretability of alignment. In this study, we contribute LongT2IBench, which comprises 14K long text-image pairs accompanied by graph-structured human annotations. Given the detail-intensive nature of long prompts, we first design a Generate-Refine-Qualify annotation protocol to convert them into textual graph structures that encompass entities, attributes, and relations. Through this transformation, fine-grained alignment annotations are achieved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling
