The Unlikely Duel: Evaluating Creative Writing in LLMs through a Unique Scenario
Carlos G\'omez-Rodr\'iguez, Paul Williams

TL;DR
This paper evaluates recent instruction-tuned large language models on a creative writing task using a unique prompt to assess their creativity, style, and humor, comparing their performance to human writers.
Contribution
It introduces a novel prompt-based evaluation method to fairly compare LLMs and humans in creative writing, highlighting the strengths and limitations of current models.
Findings
Some commercial LLMs match or outperform humans in fluency and style.
Open-source LLMs lag behind commercial models.
Humans excel in originality and humor handling.
Abstract
This is a summary of the paper "A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing", which was published in Findings of EMNLP 2023. We evaluate a range of recent state-of-the-art, instruction-tuned large language models (LLMs) on an English creative writing task, and compare them to human writers. For this purpose, we use a specifically-tailored prompt (based on an epic combat between Ignatius J. Reilly, main character of John Kennedy Toole's "A Confederacy of Dunces", and a pterodactyl) to minimize the risk of training data leakage and force the models to be creative rather than reusing existing stories. The same prompt is presented to LLMs and human writers, and evaluation is performed by humans using a detailed rubric including various aspects like fluency, style, originality or humor. Results show that some state-of-the-art commercial LLMs match or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcademic integrity and plagiarism · Legal Education and Practice Innovations · Artificial Intelligence Applications
