Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks
Jennifer Haase, Jana Gonnermann-M\"uller, Paul H. P. Hanel, Nicolas Leins, Thomas Kosch, Jan Mendling, Sebastian Pokutta

TL;DR
This study quantifies how prompts, model choice, and randomness influence large language model outputs in creative tasks, revealing prompts significantly impact quality while model choice and stochasticity mainly affect quantity.
Contribution
It provides a comprehensive analysis of output variability sources in LLMs, highlighting the relative influence of prompts, models, and sampling noise in creative tasks.
Findings
Prompts explain 36.43% of output quality variance.
Model choice explains 40.94% of output quality variance.
Within-LLM stochasticity accounts for 33.70% of output quantity variance.
Abstract
How much of LLM output variance is explained by prompts versus model choice versus stochasticity through sampling? We answer this by evaluating 12 LLMs on 10 creativity prompts with 100 samples each (N = 12,000). For output quality (originality), prompts explain 36.43% of variance, comparable to model choice (40.94%). But for output quantity (fluency), model choice (51.25%) and within-LLM variance (33.70%) dominate, with prompts explaining only 4.22%. Prompts are powerful levers for steering output quality, but given the substantial within-LLM variance (10-34%), single-sample evaluations risk conflating sampling noise with genuine prompt or model effects.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Creativity in Education and Neuroscience · Wikis in Education and Collaboration
