PromptMoG: Enhancing Diversity in Long-Prompt Image Generation via Prompt Embedding Mixture-of-Gaussian Sampling

Bo-Kai Ruan; Teng-Fang Hsiao; Ling Lo; Yi-Lun Wu; Hong-Han Shuai

arXiv:2511.20251·cs.CV·November 26, 2025

PromptMoG: Enhancing Diversity in Long-Prompt Image Generation via Prompt Embedding Mixture-of-Gaussian Sampling

Bo-Kai Ruan, Teng-Fang Hsiao, Ling Lo, Yi-Lun Wu, Hong-Han Shuai

PDF

Open Access

TL;DR

This paper investigates the challenge of maintaining diversity in long-prompt text-to-image generation, introduces a benchmark for evaluation, and proposes a novel sampling method, PromptMoG, to enhance diversity without sacrificing fidelity.

Contribution

It systematically studies the fidelity-diversity trade-off in long prompts, introduces LPD-Bench for evaluation, and proposes a training-free PromptMoG method using Mixture-of-Gaussians sampling.

Findings

01

PromptMoG improves diversity across multiple models.

02

Long prompts tend to reduce diversity in generated images.

03

LPD-Bench provides a standardized way to evaluate fidelity and diversity.

Abstract

Recent advances in text-to-image (T2I) generation have achieved remarkable visual outcomes through large-scale rectified flow models. However, how these models behave under long prompts remains underexplored. Long prompts encode rich content, spatial, and stylistic information that enhances fidelity but often suppresses diversity, leading to repetitive and less creative outputs. In this work, we systematically study this fidelity-diversity dilemma and reveal that state-of-the-art models exhibit a clear drop in diversity as prompt length increases. To enable consistent evaluation, we introduce LPD-Bench, a benchmark designed for assessing both fidelity and diversity in long-prompt generation. Building on our analysis, we develop a theoretical framework that increases sampling entropy through prompt reformulation and propose a training-free method, PromptMoG, which samples prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Computer Graphics and Visualization Techniques