LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations

Zhichao Yang; Tianjiao Gu; Jianjie Wang; Feiyu Lin; Xiangfei Sheng; Pengfei Chen; Leida Li

arXiv:2512.09271·cs.CV·December 11, 2025

LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations

Zhichao Yang, Tianjiao Gu, Jianjie Wang, Feiyu Lin, Xiangfei Sheng, Pengfei Chen, Leida Li

PDF

Open Access 1 Models 1 Video

TL;DR

LongT2IBench introduces a comprehensive benchmark with graph-structured annotations for evaluating long text-to-image generation, enabling more interpretable and fine-grained alignment assessments.

Contribution

This paper presents LongT2IBench, a novel benchmark with detailed graph annotations and a new evaluation model, LongT2IExpert, for improved assessment of long T2I models.

Findings

01

LongT2IExpert outperforms existing evaluators in alignment accuracy.

02

Graph-structured annotations enhance interpretability of evaluation results.

03

The benchmark and model facilitate better development of long T2I systems.

Abstract

The increasing popularity of long Text-to-Image (T2I) generation has created an urgent need for automatic and interpretable models that can evaluate the image-text alignment in long prompt scenarios. However, the existing T2I alignment benchmarks predominantly focus on short prompt scenarios and only provide MOS or Likert scale annotations. This inherent limitation hinders the development of long T2I evaluators, particularly in terms of the interpretability of alignment. In this study, we contribute LongT2IBench, which comprises 14K long text-image pairs accompanied by graph-structured human annotations. Given the detail-intensive nature of long prompts, we first design a Generate-Refine-Qualify annotation protocol to convert them into textual graph structures that encompass entities, attributes, and relations. Through this transformation, fine-grained alignment annotations are achieved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
yzc002/LongT2IExpert
model· 79 dl
79 dl

Videos

LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling