DeCoT: Decomposing Complex Instructions for Enhanced Text-to-Image Generation with Large Language Models
Xiaochuan Lin, Xiangyong Chen, Xuan Li, Yichen Su

TL;DR
DeCoT enhances text-to-image generation by decomposing complex instructions with large language models, significantly improving accuracy and fidelity in rendering intricate details, spatial relationships, and constraints.
Contribution
This paper introduces DeCoT, a novel framework that uses LLMs to decompose and clarify complex instructions, improving T2I models' understanding and output quality.
Findings
DeCoT improves T2I performance across multiple metrics.
DeCoT outperforms baseline models on LongBench-T2I dataset.
Human evaluations favor DeCoT-enhanced images for fidelity and quality.
Abstract
Despite remarkable advancements, current Text-to-Image (T2I) models struggle with complex, long-form textual instructions, frequently failing to accurately render intricate details, spatial relationships, or specific constraints. This limitation is highlighted by benchmarks such as LongBench-T2I, which reveal deficiencies in handling composition, specific text, and fine textures. To address this, we propose DeCoT (Decomposition-CoT), a novel framework that leverages Large Language Models (LLMs) to significantly enhance T2I models' understanding and execution of complex instructions. DeCoT operates in two core stages: first, Complex Instruction Decomposition and Semantic Enhancement, where an LLM breaks down raw instructions into structured, actionable semantic units and clarifies ambiguities; second, Multi-Stage Prompt Integration and Adaptive Generation, which transforms these units…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Video Analysis and Summarization
