Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation
Yucheng Zhou, Jiahao Yuan, Qianning Wang

TL;DR
This paper introduces LongBench-T2I, a comprehensive benchmark for evaluating text-to-image models on complex instructions, and proposes an agent framework, Plan2Gen, that improves image generation fidelity without retraining models.
Contribution
The paper presents a new benchmark for complex instruction-based image generation and an agent framework that enhances generation quality by decomposing prompts, without additional training.
Findings
LongBench-T2I includes 500 prompts across nine visual dimensions.
Plan2Gen effectively guides existing T2I models on complex prompts.
New evaluation toolkit captures nuances of complex instruction adherence.
Abstract
Recent advancements in text-to-image (T2I) generation have enabled models to produce high-quality images from textual descriptions. However, these models often struggle with complex instructions involving multiple objects, attributes, and spatial relationships. Existing benchmarks for evaluating T2I models primarily focus on general text-image alignment and fail to capture the nuanced requirements of complex, multi-faceted prompts. Given this gap, we introduce LongBench-T2I, a comprehensive benchmark specifically designed to evaluate T2I models under complex instructions. LongBench-T2I consists of 500 intricately designed prompts spanning nine diverse visual evaluation dimensions, enabling a thorough assessment of a model's ability to follow complex instructions. Beyond benchmarking, we propose an agent framework (Plan2Gen) that facilitates complex instruction-driven image generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Modeling in Geospatial Applications · Augmented Reality Applications · Human Motion and Animation
MethodsFocus · Sparse Evolutionary Training
