Dominating vs. Dominated: Generative Collapse in Diffusion Models
Hayeon Jeong, Jong-Seok Lee

TL;DR
This paper investigates the dominance phenomenon in diffusion models where one concept suppresses others during image generation, analyzing causes and proposing insights for more reliable control.
Contribution
It introduces DominanceBench, systematically analyzes the causes of dominance imbalance, and reveals how data and attention mechanisms contribute to generative collapse in diffusion models.
Findings
Limited data diversity worsens inter-concept interference.
Dominant tokens saturate attention, suppressing others over time.
Distributed attention across multiple heads causes dominance behavior.
Abstract
Text-to-image diffusion models have drawn significant attention for their ability to generate diverse and high-fidelity images. However, when generating from multi-concept prompts, one concept token often dominates the generation, suppressing the others-a phenomenon we term the Dominant-vs-Dominated (DvD) imbalance. To systematically analyze this imbalance, we introduce DominanceBench and examine its causes from both data and architectural perspectives. Through various experiments, we show that the limited instance diversity in training data exacerbates the inter-concept interference. Analysis of cross-attention dynamics further reveals that dominant tokens rapidly saturate attention, progressively suppressing others across diffusion timesteps. In addition, head ablation studies show that the DvD behavior arises from distributed attention mechanisms across multiple heads. Our findings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face Recognition and Perception · Aesthetic Perception and Analysis
