The Intricate Dance of Prompt Complexity, Quality, Diversity, and Consistency in T2I Models
Zhang Xiaofeng, Aaron Courville, Michal Drozdzal, Adriana Romero-Soriano

TL;DR
This paper investigates how prompt complexity affects the quality, diversity, and consistency of images generated by text-to-image models, revealing trade-offs and proposing an evaluation framework.
Contribution
It introduces a new framework for comparing real and synthetic data utility and analyzes the impact of prompt complexity on T2I model outputs across multiple datasets.
Findings
Higher prompt complexity reduces diversity and consistency.
Increasing prompt complexity decreases distribution shift between synthetic and real data.
Prompt expansion with a language model improves diversity and aesthetics.
Abstract
Text-to-image (T2I) models offer great potential for creating virtually limitless synthetic data, a valuable resource compared to fixed and finite real datasets. Previous works evaluate the utility of synthetic data from T2I models on three key desiderata: quality, diversity, and consistency. While prompt engineering is the primary means of interacting with T2I models, the systematic impact of prompt complexity on these critical utility axes remains underexplored. In this paper, we first conduct synthetic experiments to motivate the difficulty of generalization with regard to prompt complexity and explain the observed difficulty with theoretical derivations. Then, we introduce a new evaluation framework that can compare the utility of real data and synthetic data, and present a comprehensive analysis of how prompt complexity influences the utility of synthetic data generated by commonly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
