Can Authorship Representation Learning Capture Stylistic Features?
Andrew Wang, Cristina Aggazzotti, Rebecca Kotula, Rafael Rivera Soto,, Marcus Bishop, Nicholas Andrews

TL;DR
This paper investigates whether authorship representations learned from large text corpora primarily encode writing style, using targeted experiments to validate their sensitivity to stylistic features and potential robustness to topic shifts.
Contribution
The study systematically probes authorship representations to confirm they mainly encode stylistic features, informing future stylistic applications and robustness considerations.
Findings
Authorship representations are sensitive to writing style.
Representations may be robust to topic drift over time.
Potential for stylistic applications like style transfer.
Abstract
Automatically disentangling an author's style from the content of their writing is a longstanding and possibly insurmountable problem in computational linguistics. At the same time, the availability of large text corpora furnished with author labels has recently enabled learning authorship representations in a purely data-driven manner for authorship attribution, a task that ostensibly depends to a greater extent on encoding writing style than encoding content. However, success on this surrogate task does not ensure that such representations capture writing style since authorship could also be correlated with other latent variables, such as topic. In an effort to better understand the nature of the information these representations convey, and specifically to validate the hypothesis that they chiefly encode writing style, we systematically probe these representations through a series of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Natural Language Processing Techniques
