T2I-CompBench++: An Enhanced and Comprehensive Benchmark for   Compositional Text-to-image Generation

Kaiyi Huang; Chengqi Duan; Kaiyue Sun; Enze Xie; Zhenguo Li; Xihui Liu

arXiv:2307.06350·cs.CV·March 19, 2025·23 cites

T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation

Kaiyi Huang, Chengqi Duan, Kaiyue Sun, Enze Xie, Zhenguo Li, Xihui Liu

PDF

Open Access 1 Repo

TL;DR

T2I-CompBench++ is a comprehensive benchmark with new evaluation metrics for assessing the ability of text-to-image models to generate complex, compositional scenes involving multiple objects, attributes, and relationships.

Contribution

It introduces an enhanced benchmark with diverse prompts and novel evaluation metrics, including detection-based and MLLM-based assessments, for better evaluation of compositional text-to-image generation.

Findings

01

Benchmarking of 11 models including state-of-the-art techniques.

02

Validation of new metrics' effectiveness in evaluating compositional challenges.

03

Insights into the potential and limitations of MLLMs in evaluation.

Abstract

Despite the impressive advances in text-to-image models, they often struggle to effectively compose complex scenes with multiple objects, displaying various attributes and relationships. To address this challenge, we present T2I-CompBench++, an enhanced benchmark for compositional text-to-image generation. T2I-CompBench++ comprises 8,000 compositional text prompts categorized into four primary groups: attribute binding, object relationships, generative numeracy, and complex compositions. These are further divided into eight sub-categories, including newly introduced ones like 3D-spatial relationships and numeracy. In addition to the benchmark, we propose enhanced evaluation metrics designed to assess these diverse compositional challenges. These include a detection-based metric tailored for evaluating 3D-spatial relationships and numeracy, and an analysis leveraging Multimodal Large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Karine-Huang/T2I-CompBench
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization