Using Machine Learning to Distinguish Human-written from Machine-generated Creative Fiction
Andrea Cristina McGlinchey, Peter J Barclay

TL;DR
This study develops machine learning models to accurately distinguish human-written from AI-generated creative fiction, especially short detective stories, outperforming human judges and aiding in protecting literary authenticity.
Contribution
Introduces a novel ML-based classifier for creative fiction detection, demonstrating high accuracy and deploying a practical tool for industry use.
Findings
Naive Bayes and MLP classifiers achieve over 95% accuracy.
Models outperform human judges significantly.
Effective with short text samples (~100 words).
Abstract
Following the universal availability of generative AI systems with the release of ChatGPT, automatic detection of deceptive text created by Large Language Models has focused on domains such as academic plagiarism and "fake news". However, generative AI also poses a threat to the livelihood of creative writers, and perhaps to literary culture in general, through reduction in quality of published material. Training a Large Language Model on writers' output to generate "sham books" in a particular style seems to constitute a new form of plagiarism. This problem has been little researched. In this study, we trained Machine Learning classifier models to distinguish short samples of human-written from machine-generated creative fiction, focusing on classic detective novels. Our results show that a Naive Bayes and a Multi-Layer Perceptron classifier achieved a high degree of success (accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Digital Humanities and Scholarship · Artificial Intelligence in Games
