DeCoT: Decomposing Complex Instructions for Enhanced Text-to-Image Generation with Large Language Models

Xiaochuan Lin; Xiangyong Chen; Xuan Li; Yichen Su

arXiv:2508.12396·cs.CV·August 19, 2025

DeCoT: Decomposing Complex Instructions for Enhanced Text-to-Image Generation with Large Language Models

Xiaochuan Lin, Xiangyong Chen, Xuan Li, Yichen Su

PDF

Open Access

TL;DR

DeCoT enhances text-to-image generation by decomposing complex instructions with large language models, significantly improving accuracy and fidelity in rendering intricate details, spatial relationships, and constraints.

Contribution

This paper introduces DeCoT, a novel framework that uses LLMs to decompose and clarify complex instructions, improving T2I models' understanding and output quality.

Findings

01

DeCoT improves T2I performance across multiple metrics.

02

DeCoT outperforms baseline models on LongBench-T2I dataset.

03

Human evaluations favor DeCoT-enhanced images for fidelity and quality.

Abstract

Despite remarkable advancements, current Text-to-Image (T2I) models struggle with complex, long-form textual instructions, frequently failing to accurately render intricate details, spatial relationships, or specific constraints. This limitation is highlighted by benchmarks such as LongBench-T2I, which reveal deficiencies in handling composition, specific text, and fine textures. To address this, we propose DeCoT (Decomposition-CoT), a novel framework that leverages Large Language Models (LLMs) to significantly enhance T2I models' understanding and execution of complex instructions. DeCoT operates in two core stages: first, Complex Instruction Decomposition and Semantic Enhancement, where an LLM breaks down raw instructions into structured, actionable semantic units and clarifies ambiguities; second, Multi-Stage Prompt Integration and Adaptive Generation, which transforms these units…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Video Analysis and Summarization