TL;DR
CREATE is a benchmark for evaluating large language models' ability to perform associative reasoning by generating diverse, specific paths connecting concepts, aiming to measure creative capacity objectively.
Contribution
The paper introduces CREATE, a novel benchmark for assessing models' associative creativity, emphasizing diversity, specificity, and the capacity to generate multiple meaningful connections.
Findings
Stronger models produce more diverse and specific paths.
High multiplicity of answers makes saturation difficult.
Creative prompting yields limited improvements.
Abstract
A key component of creativity is associative reasoning: the ability to draw novel yet meaningful connections between concepts. We introduce CREATE, a benchmark designed to evaluate models' capacity for creative associative reasoning. CREATE requires models to generate sets of paths connecting concepts in a model's parametric knowledge. Paths should have high specificity (distinctiveness and closeness of the concept connection) and high diversity (dissimilarity from other paths), and models are scored more highly if they produce a larger set of strong, diverse paths. This task shares demands of real creativity tasks like hypothesis generation, including an extremely large search space, but enables collection of a sizable benchmark with objective answer grading. Evaluation of frontier models shows that the strongest models achieve higher creative utility than others, with the high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
