Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation

Yucheng Zhou; Jiahao Yuan; Qianning Wang

arXiv:2505.24787·cs.CV·June 2, 2025

Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation

Yucheng Zhou, Jiahao Yuan, Qianning Wang

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces LongBench-T2I, a comprehensive benchmark for evaluating text-to-image models on complex instructions, and proposes an agent framework, Plan2Gen, that improves image generation fidelity without retraining models.

Contribution

The paper presents a new benchmark for complex instruction-based image generation and an agent framework that enhances generation quality by decomposing prompts, without additional training.

Findings

01

LongBench-T2I includes 500 prompts across nine visual dimensions.

02

Plan2Gen effectively guides existing T2I models on complex prompts.

03

New evaluation toolkit captures nuances of complex instruction adherence.

Abstract

Recent advancements in text-to-image (T2I) generation have enabled models to produce high-quality images from textual descriptions. However, these models often struggle with complex instructions involving multiple objects, attributes, and spatial relationships. Existing benchmarks for evaluating T2I models primarily focus on general text-image alignment and fail to capture the nuanced requirements of complex, multi-faceted prompts. Given this gap, we introduce LongBench-T2I, a comprehensive benchmark specifically designed to evaluate T2I models under complex instructions. LongBench-T2I consists of 500 intricately designed prompts spanning nine diverse visual evaluation dimensions, enabling a thorough assessment of a model's ability to follow complex instructions. Beyond benchmarking, we propose an agent framework (Plan2Gen) that facilitates complex instruction-driven image generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yczhou001/longbench-t2i
pytorchOfficial

Datasets

YCZhou/LongBench-T2I
dataset· 28 dl
28 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Modeling in Geospatial Applications · Augmented Reality Applications · Human Motion and Animation

MethodsFocus · Sparse Evolutionary Training