T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation
Kaiyi Huang, Chengqi Duan, Kaiyue Sun, Enze Xie, Zhenguo Li, Xihui Liu

TL;DR
T2I-CompBench++ is a comprehensive benchmark with new evaluation metrics for assessing the ability of text-to-image models to generate complex, compositional scenes involving multiple objects, attributes, and relationships.
Contribution
It introduces an enhanced benchmark with diverse prompts and novel evaluation metrics, including detection-based and MLLM-based assessments, for better evaluation of compositional text-to-image generation.
Findings
Benchmarking of 11 models including state-of-the-art techniques.
Validation of new metrics' effectiveness in evaluating compositional challenges.
Insights into the potential and limitations of MLLMs in evaluation.
Abstract
Despite the impressive advances in text-to-image models, they often struggle to effectively compose complex scenes with multiple objects, displaying various attributes and relationships. To address this challenge, we present T2I-CompBench++, an enhanced benchmark for compositional text-to-image generation. T2I-CompBench++ comprises 8,000 compositional text prompts categorized into four primary groups: attribute binding, object relationships, generative numeracy, and complex compositions. These are further divided into eight sub-categories, including newly introduced ones like 3D-spatial relationships and numeracy. In addition to the benchmark, we propose enhanced evaluation metrics designed to assess these diverse compositional challenges. These include a detection-based metric tailored for evaluating 3D-spatial relationships and numeracy, and an analysis leveraging Multimodal Large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization
