MCCD: Multi-Agent Collaboration-based Compositional Diffusion for   Complex Text-to-Image Generation

Mingcheng Li; Xiaolu Hou; Ziyang Liu; Dingkang Yang; Ziyun Qian,; Jiawei Chen; Jinjie Wei; Yue Jiang; Qingyao Xu; Lihua Zhang

arXiv:2505.02648·cs.CV·May 7, 2025

MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

Mingcheng Li, Xiaolu Hou, Ziyang Liu, Dingkang Yang, Ziyun Qian,, Jiawei Chen, Jinjie Wei, Yue Jiang, Qingyao Xu, Lihua Zhang

PDF

Open Access

TL;DR

This paper introduces MCCD, a novel multi-agent collaboration approach for diffusion models that significantly improves complex scene text-to-image generation by effectively parsing scenes and refining object regions.

Contribution

The paper presents a multi-agent collaboration framework and hierarchical compositional diffusion method, enabling better handling of complex prompts in text-to-image generation.

Findings

01

Significant performance improvements over baseline models.

02

Effective scene parsing with multi-agent system.

03

Enhanced object region refinement and high-fidelity scene generation.

Abstract

Diffusion models have shown excellent performance in text-to-image generation. Nevertheless, existing methods often suffer from performance bottlenecks when handling complex prompts that involve multiple objects, characteristics, and relations. Therefore, we propose a Multi-agent Collaboration-based Compositional Diffusion (MCCD) for text-to-image generation for complex scenes. Specifically, we design a multi-agent collaboration-based scene parsing module that generates an agent system comprising multiple agents with distinct tasks, utilizing MLLMs to extract various scene elements effectively. In addition, Hierarchical Compositional diffusion utilizes a Gaussian mask and filtering to refine bounding box regions and enhance objects through region enhancement, resulting in the accurate and high-fidelity generation of complex scenes. Comprehensive experiments demonstrate that our MCCD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Semantic Web and Ontologies · Video Analysis and Summarization

MethodsDiffusion