An Exploration of Default Images in Text-to-Image Generation
Hannu Simonen, Atte Kiviniemi, Hannah Johnston, Helena Barranha, Jonas Oppenlaender

TL;DR
This paper investigates default images in text-to-image generation, revealing their consistency across prompts and exploring their impact on user satisfaction, which informs future improvements in prompt engineering.
Contribution
First comprehensive analysis of default images in TTI models, combining manual prompts, large-scale computational analysis, and user studies to understand their nature and effects.
Findings
Default images are consistent across unrelated prompts.
Default images influence user satisfaction.
Large-scale analysis reveals common default images.
Abstract
In the creative practice of text-to-image (TTI) generation, images are synthesized from textual prompts. By design, TTI models always yield an output, even if the prompt contains unknown terms. In this case, the model may generate default images: images that closely resemble each other across many unrelated prompts. Studying default images is valuable for designing better solutions for prompt engineering and TTI generation. We present the first investigation into default images on Midjourney. We describe an initial study in which we manually created input prompts triggering default images, and several ablation studies. Building on these, we conduct a computational analysis of over 750,000 images, revealing consistent default images across unrelated prompts. We also conduct an online user study investigating how default images may affect user satisfaction. Our work lays the foundation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
