Assessing the Creativity of Large Language Models: Testing, Limits, and New Frontiers
Samuel Schapiro, Alexi Gladstone, Jonah Black, Heng Ji

TL;DR
This paper evaluates existing creativity tests for large language models, finds their limitations, and introduces the DRAT, a new test that reliably predicts scientific ideation and combines divergent and convergent thinking assessment.
Contribution
It systematically assesses the validity of human creativity tests for LLMs and introduces the DRAT, the first test predicting scientific ideation effectively.
Findings
DAT and Conditional DAT predict creative writing and divergent thinking.
No existing test reliably predicts scientific ideation.
DRAT is the first test to predict scientific ideation and combines divergent and convergent thinking.
Abstract
Measuring the creativity of large language models (LLMs) is essential for designing methods that can improve creativity and for enhancing our scientific understanding of this ability. To accomplish this, it has become common in recent years to administer tests of human creativity to LLMs. Although these tests provide a convenient and fully automated way to score "creativity," their validity as measures of machine creativity has not been established, and these tests already have limited validity as predictors of human creativity. To address this problem, we conduct the first large-scale, systematic study assessing the effectiveness of human creativity tests for predicting the creative achievement of LLMs across three target constructs: creative writing, divergent thinking, and scientific ideation. We find that the Divergent Association Task (DAT) and the Conditional DAT are the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
