MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li, Xiaolu Hou, Ziyang Liu, Dingkang Yang, Ziyun Qian,, Jiawei Chen, Jinjie Wei, Yue Jiang, Qingyao Xu, Lihua Zhang

TL;DR
This paper introduces MCCD, a novel multi-agent collaboration approach for diffusion models that significantly improves complex scene text-to-image generation by effectively parsing scenes and refining object regions.
Contribution
The paper presents a multi-agent collaboration framework and hierarchical compositional diffusion method, enabling better handling of complex prompts in text-to-image generation.
Findings
Significant performance improvements over baseline models.
Effective scene parsing with multi-agent system.
Enhanced object region refinement and high-fidelity scene generation.
Abstract
Diffusion models have shown excellent performance in text-to-image generation. Nevertheless, existing methods often suffer from performance bottlenecks when handling complex prompts that involve multiple objects, characteristics, and relations. Therefore, we propose a Multi-agent Collaboration-based Compositional Diffusion (MCCD) for text-to-image generation for complex scenes. Specifically, we design a multi-agent collaboration-based scene parsing module that generates an agent system comprising multiple agents with distinct tasks, utilizing MLLMs to extract various scene elements effectively. In addition, Hierarchical Compositional diffusion utilizes a Gaussian mask and filtering to refine bounding box regions and enhance objects through region enhancement, resulting in the accurate and high-fidelity generation of complex scenes. Comprehensive experiments demonstrate that our MCCD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Semantic Web and Ontologies · Video Analysis and Summarization
MethodsDiffusion
