All Seeds Are Not Equal: Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds
Shuangqi Li, Hieu Le, Jingyi Xu, Mathieu Salzmann

TL;DR
This paper investigates how initial random seeds affect compositional image generation in diffusion models and proposes a method to select reliable seeds and fine-tune models for improved consistency and accuracy.
Contribution
The paper introduces a seed mining technique to identify reliable initial noise patterns, enhancing compositional image generation without manual annotation.
Findings
Significant improvement in compositional accuracy after fine-tuning.
Reliable seeds lead to more consistent object placement in generated images.
Quantitative gains of up to 60.7% in spatial composition accuracy.
Abstract
Text-to-image diffusion models have demonstrated remarkable capability in generating realistic images from arbitrary text prompts. However, they often produce inconsistent results for compositional prompts such as "two dogs" or "a penguin on the right of a bowl". Understanding these inconsistencies is crucial for reliable image generation. In this paper, we highlight the significant role of initial noise in these inconsistencies, where certain noise patterns are more reliable for compositional prompts than others. Our analyses reveal that different initial random seeds tend to guide the model to place objects in distinct image areas, potentially adhering to specific patterns of camera angles and image composition associated with the seed. To improve the model's compositional ability, we propose a method for mining these reliable cases, resulting in a curated training set of generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
MethodsSparse Evolutionary Training · Diffusion
