PromptMoG: Enhancing Diversity in Long-Prompt Image Generation via Prompt Embedding Mixture-of-Gaussian Sampling
Bo-Kai Ruan, Teng-Fang Hsiao, Ling Lo, Yi-Lun Wu, Hong-Han Shuai

TL;DR
This paper investigates the challenge of maintaining diversity in long-prompt text-to-image generation, introduces a benchmark for evaluation, and proposes a novel sampling method, PromptMoG, to enhance diversity without sacrificing fidelity.
Contribution
It systematically studies the fidelity-diversity trade-off in long prompts, introduces LPD-Bench for evaluation, and proposes a training-free PromptMoG method using Mixture-of-Gaussians sampling.
Findings
PromptMoG improves diversity across multiple models.
Long prompts tend to reduce diversity in generated images.
LPD-Bench provides a standardized way to evaluate fidelity and diversity.
Abstract
Recent advances in text-to-image (T2I) generation have achieved remarkable visual outcomes through large-scale rectified flow models. However, how these models behave under long prompts remains underexplored. Long prompts encode rich content, spatial, and stylistic information that enhances fidelity but often suppresses diversity, leading to repetitive and less creative outputs. In this work, we systematically study this fidelity-diversity dilemma and reveal that state-of-the-art models exhibit a clear drop in diversity as prompt length increases. To enable consistent evaluation, we introduce LPD-Bench, a benchmark designed for assessing both fidelity and diversity in long-prompt generation. Building on our analysis, we develop a theoretical framework that increases sampling entropy through prompt reformulation and propose a training-free method, PromptMoG, which samples prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Computer Graphics and Visualization Techniques
