Design Guidelines for Prompt Engineering Text-to-Image Generative Models
Vivian Liu, Lydia B. Chilton

TL;DR
This paper investigates how prompt keywords and hyperparameters influence the quality of images generated by text-to-image models, providing practical guidelines to improve output coherence and relevance.
Contribution
It offers a systematic analysis of prompt structures and hyperparameter effects, resulting in actionable design guidelines for better image generation.
Findings
Structured prompts with subject and style keywords improve coherence.
Certain hyperparameters significantly affect output quality.
Guidelines help users produce more relevant and coherent images.
Abstract
Text-to-image generative models are a new and powerful way to generate visual artwork. However, the open-ended nature of text as interaction is double-edged; while users can input anything and have access to an infinite range of generations, they also must engage in brute-force trial and error with the text prompt when the result quality is poor. We conduct a study exploring what prompt keywords and model hyperparameters can help produce coherent outputs. In particular, we study prompts structured to include subject and style keywords and investigate success and failure modes of these prompts. Our evaluation of 5493 generations over the course of five experiments spans 51 abstract and concrete subjects as well as 51 abstract and figurative styles. From this evaluation, we present design guidelines that can help people produce better outcomes from text-to-image generative models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAesthetic Perception and Analysis · Data Visualization and Analytics · Human Motion and Animation
