Measuring Embedding Sensitivity to Authorial Style in French: Comparing Literary Texts with Language Model Rewritings
Benjamin Icard, Lila Sainero, Alice Breton, Evangelia Zve, Jean-Gabriel Ganascia

TL;DR
This paper investigates how well French language model embeddings encode and retain authorial stylistic features after rewriting, revealing that embeddings reliably capture and preserve stylistic signals despite LLM modifications.
Contribution
It provides the first systematic analysis of stylistic information in French LLM embeddings and demonstrates their robustness post-rewriting, aiding authorship detection.
Findings
Embeddings reliably encode authorial style in French.
Stylistic signals persist after LLM rewriting.
Embeddings exhibit LLM-specific stylistic patterns.
Abstract
Large language models (LLMs) can convincingly imitate human writing styles, yet it remains unclear how much stylistic information is encoded in embeddings from any language model and retained after LLM rewriting. We investigate these questions in French, using a controlled literary dataset to quantify the effect of stylistic variation via changes in embedding dispersion. We observe that embeddings reliably capture authorial stylistic features and that these signals persist after rewriting, while also exhibiting LLM-specific patterns. These analytical results offer promising directions for authorship imitation detection in the era of language models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
