A Stylometric Application of Large Language Models

Harrison F. Stropkay; Jiayi Chen; Mohammad J. Latifi; Daniel N. Rockmore; Jeremy R. Manning

arXiv:2510.21958·cs.CL·October 28, 2025

A Stylometric Application of Large Language Models

Harrison F. Stropkay, Jiayi Chen, Mohammad J. Latifi, Daniel N. Rockmore, Jeremy R. Manning

PDF

8 Models 5 Datasets

TL;DR

This paper demonstrates that large language models, specifically GPT-2 trained on individual authors, can effectively identify and distinguish unique author writing styles, with applications in authorship attribution and stylometry.

Contribution

It introduces a novel method of using LLMs trained on single authors to capture and identify distinctive writing styles for authorship attribution.

Findings

01

GPT-2 models trained on individual authors predict their texts more accurately.

02

The approach successfully attributes authorship in known and disputed cases.

03

The method confirms R. P. Thompson's authorship of a disputed Oz book.

Abstract

We show that large language models (LLMs) can be used to distinguish the writings of different authors. Specifically, an individual GPT-2 model, trained from scratch on the works of one author, will predict held-out text from that author more accurately than held-out text from other authors. We suggest that, in this way, a model trained on one author's works embodies the unique writing style of that author. We first demonstrate our approach on books written by eight different (known) authors. We also use this approach to confirm R. P. Thompson's authorship of the well-studied 15th book of the Oz series, originally attributed to F. L. Baum.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.