Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles
Jian Zhu, David Jurgens

TL;DR
This paper introduces a neural approach to characterizing individual writing styles, revealing that idiolects are distinctive yet consistent, and quantifying linguistic contributions to style variation.
Contribution
It presents a novel neural method for learning and analyzing idiolects, demonstrating their regularities and the impact of linguistic elements on individual styles.
Findings
Neural models effectively identify authors from short texts.
Idiolects show consistent yet distinctive stylistic features.
Linguistic perturbation quantifies contributions to style variation.
Abstract
An individual's variation in writing style is often a function of both social and personal attributes. While structured social variation has been extensively studied, e.g., gender based variation, far less is known about how to characterize individual styles due to their idiosyncratic nature. We introduce a new approach to studying idiolects through a massive cross-author comparison to identify and encode stylistic features. The neural model achieves strong performance at authorship identification on short texts and through an analogy-based probing task, showing that the learned representations exhibit surprising regularities that encode qualitative and quantitative shifts of idiolectal styles. Through text perturbation, we quantify the relative contributions of different linguistic elements to idiolectal variation. Furthermore, we provide a description of idiolects through measuring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Natural Language Processing Techniques · Topic Modeling
