Stylometry recognizes human and LLM-generated texts in short samples
Karol Przystalski, Jan K. Argasi\'nski, Iwona Grabska-Gradzi\'nska, Jeremi K. Ochab

TL;DR
This study demonstrates that stylometry can effectively distinguish between human and LLM-generated texts in short samples, achieving high accuracy and identifying characteristic stylistic features.
Contribution
The paper introduces a new benchmark dataset and applies stylometric analysis with machine learning to accurately classify texts as human or machine-generated.
Findings
Achieved up to 0.87 Matthews correlation in multiclass classification.
Binary classification accuracy reached up to 0.98 for Wikipedia and GPT-4.
Identified stylistic features characteristic of LLMs and human texts.
Abstract
The paper explores stylometry as a method to distinguish between texts created by Large Language Models (LLMs) and humans, addressing issues of model attribution, intellectual property, and ethical AI use. Stylometry has been used extensively to characterise the style and attribute authorship of texts. By applying it to LLM-generated texts, we identify their emergent writing patterns. The paper involves creating a benchmark dataset based on Wikipedia, with (a) human-written term summaries, (b) texts generated purely by LLMs (GPT-3.5/4, LLaMa 2/3, Orca, and Falcon), (c) processed through multiple text summarisation methods (T5, BART, Gensim, and Sumy), and (d) rephrasing methods (Dipper, T5). The 10-sentence long texts were classified by tree-based models (decision trees and LightGBM) using human-designed (StyloMetrix) and n-gram-based (our own pipeline) stylometric features that encode…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
