The Unlikely Duel: Evaluating Creative Writing in LLMs through a Unique   Scenario

Carlos G\'omez-Rodr\'iguez; Paul Williams

arXiv:2406.15891·cs.CL·June 25, 2024

The Unlikely Duel: Evaluating Creative Writing in LLMs through a Unique Scenario

Carlos G\'omez-Rodr\'iguez, Paul Williams

PDF

Open Access

TL;DR

This paper evaluates recent instruction-tuned large language models on a creative writing task using a unique prompt to assess their creativity, style, and humor, comparing their performance to human writers.

Contribution

It introduces a novel prompt-based evaluation method to fairly compare LLMs and humans in creative writing, highlighting the strengths and limitations of current models.

Findings

01

Some commercial LLMs match or outperform humans in fluency and style.

02

Open-source LLMs lag behind commercial models.

03

Humans excel in originality and humor handling.

Abstract

This is a summary of the paper "A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing", which was published in Findings of EMNLP 2023. We evaluate a range of recent state-of-the-art, instruction-tuned large language models (LLMs) on an English creative writing task, and compare them to human writers. For this purpose, we use a specifically-tailored prompt (based on an epic combat between Ignatius J. Reilly, main character of John Kennedy Toole's "A Confederacy of Dunces", and a pterodactyl) to minimize the risk of training data leakage and force the models to be creative rather than reusing existing stories. The same prompt is presented to LLMs and human writers, and evaluation is performed by humans using a detailed rubric including various aspects like fluency, style, originality or humor. Results show that some state-of-the-art commercial LLMs match or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAcademic integrity and plagiarism · Legal Education and Practice Innovations · Artificial Intelligence Applications