RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models
Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Kai-Ni Wang, Jiake, Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui

TL;DR
RealCompo is a training-free framework that dynamically balances existing text-to-image and spatial-aware diffusion models to improve multi-object compositional generation while maintaining realism.
Contribution
It introduces a novel balancer mechanism that enables plug-and-play integration of models, enhancing compositionality without additional training.
Findings
Outperforms state-of-the-art models in multi-object compositional tasks
Maintains high realism and compositionality in generated images
Seamlessly extends to various spatial-aware diffusion models
Abstract
Diffusion models have achieved remarkable advancements in text-to-image generation. However, existing models still have many difficulties when faced with multiple-object compositional generation. In this paper, we propose RealCompo, a new training-free and transferred-friendly text-to-image generation framework, which aims to leverage the respective advantages of text-to-image models and spatial-aware image diffusion models (e.g., layout, keypoints and segmentation maps) to enhance both realism and compositionality of the generated images. An intuitive and novel balancer is proposed to dynamically balance the strengths of the two models in denoising process, allowing plug-and-play use of any model without extra training. Extensive experiments show that our RealCompo consistently outperforms state-of-the-art text-to-image models and spatial-aware image diffusion models in multiple-object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsDiffusion
