Looking for the Inner Music: Probing LLMs' Understanding of Literary Style
Rebecca M. M. Hicke, David Mimno

TL;DR
This paper investigates how large language models can identify literary authorship and genre, revealing differences in how models memorize or learn stylistic features, and probing the features that define literary style.
Contribution
It extends stylometry with LLMs to new datasets, compares different models' reliance on memorization versus learned features, and analyzes stylistic traits through probing methods.
Findings
LLMs can distinguish authorship and genre effectively.
Author style is easier to define than genre-level style.
Pronoun usage and word order are key stylistic features.
Abstract
Recent work has demonstrated that language models can be trained to identify the author of much shorter literary passages than has been thought feasible for traditional stylometry. We replicate these results for authorship and extend them to a new dataset measuring novel genre. We find that LLMs are able to distinguish authorship and genre, but they do so in different ways. Some models seem to rely more on memorization, while others benefit more from training to learn author/genre characteristics. We then use three methods to probe one high-performing LLM for features that define style. These include direct syntactic ablations to input text as well as two methods that look at model internals. We find that authorial style is easier to define than genre-level style and is more impacted by minor syntactic decisions and contextual word usage. However, some traits like pronoun usage and word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Humanities and Scholarship
