Capturing Style in Author and Document Representation
Enzo Terreau, Antoine Gourru, Julien Velcin

TL;DR
This paper introduces a novel VIB-based model that learns author and document embeddings emphasizing writing style, improving authorship attribution and stylistic analysis across literary and online datasets.
Contribution
It proposes a new architecture that explicitly captures writing style in author and document representations, enhancing interpretability and performance in authorship tasks.
Findings
Outperforms recent baselines in authorship attribution
Effectively captures stylistic features in literary and online data
Provides interpretable stylistic embeddings
Abstract
A wide range of Deep Natural Language Processing (NLP) models integrates continuous and low dimensional representations of words and documents. Surprisingly, very few models study representation learning for authors. These representations can be used for many NLP tasks, such as author identification and classification, or in recommendation systems. A strong limitation of existing works is that they do not explicitly capture writing style, making them hardly applicable to literary data. We therefore propose a new architecture based on Variational Information Bottleneck (VIB) that learns embeddings for both authors and documents with a stylistic constraint. Our model fine-tunes a pre-trained document encoder. We stimulate the detection of writing style by adding predefined stylistic features making the representation axis interpretable with respect to writing style indicators. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Topic Modeling
