A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing
Carlos G\'omez-Rodr\'iguez, Paul Williams

TL;DR
This study comprehensively evaluates various large language models on complex creative writing tasks, revealing that some commercial models match or surpass human performance in fluency and coherence, but humans still excel in creativity and humor.
Contribution
It provides a thorough comparison of recent LLMs on creative writing, using a novel, open-ended scenario to assess their capabilities across multiple criteria.
Findings
State-of-the-art commercial LLMs match or outperform humans in fluency and coherence.
Open-source LLMs lag behind commercial models in creative writing.
Humans maintain an edge in creativity and humor handling.
Abstract
We evaluate a range of recent LLMs on English creative writing, a challenging and complex task that requires imagination, coherence, and style. We use a difficult, open-ended scenario chosen to avoid training data reuse: an epic narration of a single combat between Ignatius J. Reilly, the protagonist of the Pulitzer Prize-winning novel A Confederacy of Dunces (1980), and a pterodactyl, a prehistoric flying reptile. We ask several LLMs and humans to write such a story and conduct a human evalution involving various criteria such as fluency, coherence, originality, humor, and style. Our results show that some state-of-the-art commercial LLMs match or slightly outperform our writers in most dimensions; whereas open-source LLMs lag behind. Humans retain an edge in creativity, while humor shows a binary divide between LLMs that can handle it comparably to humans and those that fail at it. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Artificial Intelligence in Games
