CS4: Measuring the Creativity of Large Language Models Automatically by   Controlling the Number of Story-Writing Constraints

Anirudh Atmakuru; Jatin Nainani; Rohith Siddhartha Reddy Bheemreddy,; Anirudh Lakkaraju; Zonghai Yao; Hamed Zamani; Haw-Shiuan Chang

arXiv:2410.04197·cs.CL·October 8, 2024

CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints

Anirudh Atmakuru, Jatin Nainani, Rohith Siddhartha Reddy Bheemreddy,, Anirudh Lakkaraju, Zonghai Yao, Hamed Zamani, Haw-Shiuan Chang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CS4, a benchmark dataset that measures the creativity of large language models in story writing by controlling prompt constraints, revealing how different models balance creativity, instruction-following, and coherence.

Contribution

The paper presents a novel benchmark dataset, CS4, that assesses LLM creativity through prompt constraint variation, enabling indirect measurement without human annotations.

Findings

01

Different LLMs perform variably under various constraints.

02

Increasing constraints reduces models' ability to retell training data.

03

Learning from Human Feedback improves story selection but not creativity.

Abstract

Evaluating the creativity of large language models (LLMs) in story writing is difficult because LLM-generated stories could seemingly look creative but be very similar to some existing stories in their huge and proprietary training corpus. To overcome this challenge, we introduce a novel benchmark dataset with varying levels of prompt specificity: CS4 ( $C$ omparing the $S$ kill of $C$ reating $S$ tories by $C$ ontrolling the $S$ ynthesized $C$ onstraint $S$ pecificity). By increasing the number of requirements/constraints in the prompt, we can increase the prompt specificity and hinder LLMs from retelling high-quality narratives in their training data. Consequently, CS4 empowers us to indirectly measure the LLMs' creativity without human annotations. Our experiments on LLaMA, Gemma, and Mistral not only highlight the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anirudhlakkaraju/cs4_benchmark
noneOfficial

Videos

CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints· underline

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Artificial Intelligence in Games

MethodsLLaMA