Same Author or Just Same Topic? Towards Content-Independent Style   Representations

Anna Wegmann; Marijn Schraagen; Dong Nguyen

arXiv:2204.04907·cs.CL·April 12, 2022

Same Author or Just Same Topic? Towards Content-Independent Style Representations

Anna Wegmann, Marijn Schraagen, Dong Nguyen

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper investigates how to develop style representations that are independent of content by modifying authorship verification training to control for content, improving the focus on stylistic features.

Contribution

It introduces a variation of the AV training task that controls for content using conversation or domain labels, enhancing style representation quality.

Findings

01

Controlling for conversation improves style representation.

02

Content control enhances style-content disentanglement.

03

Style representations better reflect stylistic features when content is controlled.

Abstract

Linguistic style is an integral component of language. Recent advances in the development of style representations have increasingly used training objectives from authorship verification (AV): Do two texts have the same author? The assumption underlying the AV training task (same author approximates same writing style) enables self-supervised and, thus, extensive training. However, a good performance on the AV task does not ensure good "general-purpose" style representations. For example, as the same author might typically write about certain topics, representations trained on AV might also encode content information instead of style alone. We introduce a variation of the AV training task that controls for content using conversation or domain labels. We evaluate whether known style dimensions are represented and preferred over content information through an original variation to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nlpsoc/stel
tfOfficial

Models

🤗
AnnaWegmann/Style-Embedding
model· 9.0k dl· ♡ 23
9.0k dl♡ 23

Datasets

AnnaWegmann/StyleEmbeddingData
dataset· 66 dl
66 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Topic Modeling · Natural Language Processing Techniques