Capturing Style in Author and Document Representation

Enzo Terreau; Antoine Gourru; Julien Velcin

arXiv:2407.13358·cs.CL·June 27, 2025

Capturing Style in Author and Document Representation

Enzo Terreau, Antoine Gourru, Julien Velcin

PDF

Open Access

TL;DR

This paper introduces a novel VIB-based model that learns author and document embeddings emphasizing writing style, improving authorship attribution and stylistic analysis across literary and online datasets.

Contribution

It proposes a new architecture that explicitly captures writing style in author and document representations, enhancing interpretability and performance in authorship tasks.

Findings

01

Outperforms recent baselines in authorship attribution

02

Effectively captures stylistic features in literary and online data

03

Provides interpretable stylistic embeddings

Abstract

A wide range of Deep Natural Language Processing (NLP) models integrates continuous and low dimensional representations of words and documents. Surprisingly, very few models study representation learning for authors. These representations can be used for many NLP tasks, such as author identification and classification, or in recommendation systems. A strong limitation of existing works is that they do not explicitly capture writing style, making them hardly applicable to literary data. We therefore propose a new architecture based on Variational Information Bottleneck (VIB) that learns embeddings for both authors and documents with a stylistic constraint. Our model fine-tunes a pre-trained document encoder. We stimulate the detection of writing style by adding predefined stylistic features making the representation axis interpretable with respect to writing style indicators. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Topic Modeling