coDrawAgents: A Multi-Agent Dialogue Framework for Compositional Image Generation

Chunhan Li; Qifeng Wu; Jia-Hui Pan; Ka-Hei Hui; Jingyu Hu; Yuming Jiang; Bin Sheng; Xihui Liu; Wenjuan Gong; Zhengzhe Liu

arXiv:2603.12829·cs.CV·March 16, 2026

coDrawAgents: A Multi-Agent Dialogue Framework for Compositional Image Generation

Chunhan Li, Qifeng Wu, Jia-Hui Pan, Ka-Hei Hui, Jingyu Hu, Yuming Jiang, Bin Sheng, Xihui Liu, Wenjuan Gong, Zhengzhe Liu

PDF

Open Access

TL;DR

coDrawAgents introduces a multi-agent dialogue framework with specialized roles that collaboratively enhance compositional image generation, significantly improving alignment, spatial accuracy, and attribute fidelity in complex scenes.

Contribution

This work presents a novel multi-agent dialogue system with four specialized agents that collaboratively improve compositional text-to-image generation, addressing layout complexity and error correction.

Findings

01

Improves text-image alignment and attribute fidelity

02

Enhances spatial accuracy in generated images

03

Outperforms existing methods on benchmark datasets

Abstract

Text-to-image generation has advanced rapidly, but existing models still struggle with faithfully composing multiple objects and preserving their attributes in complex scenes. We propose coDrawAgents, an interactive multi-agent dialogue framework with four specialized agents: Interpreter, Planner, Checker, and Painter that collaborate to improve compositional generation. The Interpreter adaptively decides between a direct text-to-image pathway and a layout-aware multi-agent process. In the layout-aware mode, it parses the prompt into attribute-rich object descriptors, ranks them by semantic salience, and groups objects with the same semantic priority level for joint generation. Guided by the Interpreter, the Planner adopts a divide-and-conquer strategy, incrementally proposing layouts for objects with the same semantic priority level while grounding decisions in the evolving visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games