Better Call Claude: Can LLMs Detect Changes of Writing Style?
Johannes R\"omisch, Svetlana Gorovaia, Mariia Halchynska, Gleb Schmidt, Ivan P. Yamshchikov

TL;DR
This paper evaluates how well large language models can detect sentence-level writing style changes without training, revealing their sensitivity to stylistic variations and their potential reliance on content-independent cues.
Contribution
It benchmarks four state-of-the-art LLMs on style change detection datasets, highlighting their sensitivity to stylistic shifts and their performance as a challenging baseline.
Findings
LLMs are sensitive to stylistic variations at sentence level
They outperform baseline methods in style change detection
Models may rely more on stylistic signals than content
Abstract
This article explores the zero-shot performance of state-of-the-art large language models (LLMs) on one of the most challenging tasks in authorship analysis: sentence-level style change detection. Benchmarking four LLMs on the official PAN~2024 and 2025 "Multi-Author Writing Style Analysis" datasets, we present several observations. First, state-of-the-art generative models are sensitive to variations in writing style - even at the granular level of individual sentences. Second, their accuracy establishes a challenging baseline for the task, outperforming suggested baselines of the PAN competition. Finally, we explore the influence of semantics on model predictions and present evidence suggesting that the latest generation of LLMs may be more sensitive to content-independent and purely stylistic signals than previously reported.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
