Synthetic Curriculum Reinforces Compositional Text-to-Image Generation
Shijian Wang, Runhao Fu, Siyi Zhao, Qingqin Zhan, Xingjian Wang, Jiarui Jin, Yuan Lu, Hanqian Wu, Cunjian Chen

TL;DR
This paper introduces CompGen, a reinforcement learning framework using scene graphs and curriculum strategies to improve the compositional capabilities of text-to-image models, especially in complex scene synthesis.
Contribution
It proposes a novel curriculum reinforcement learning method with adaptive graph sampling to enhance compositional T2I generation, integrating it into existing models.
Findings
CompGen improves compositional generation in diffusion and auto-regressive models.
Easy-to-hard and Gaussian curriculum strategies outperform random sampling.
Significant enhancement in complex scene synthesis capabilities.
Abstract
Text-to-Image (T2I) generation has long been an open problem, with compositional synthesis remaining particularly challenging. This task requires accurate rendering of complex scenes containing multiple objects that exhibit diverse attributes as well as intricate spatial and semantic relationships, demanding both precise object placement and coherent inter-object interactions. In this paper, we propose a novel compositional curriculum reinforcement learning framework named CompGen that addresses compositional weakness in existing T2I models. Specifically, we leverage scene graphs to establish a novel difficulty criterion for compositional ability and develop a corresponding adaptive Markov Chain Monte Carlo graph sampling algorithm. This difficulty-aware approach enables the synthesis of training curriculum data that progressively optimize T2I models through reinforcement learning. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis
