Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers   Reveals Distinctive yet Consistent Individual Styles

Jian Zhu; David Jurgens

arXiv:2109.03158·cs.CL·September 13, 2021

Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

Jian Zhu, David Jurgens

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural approach to characterizing individual writing styles, revealing that idiolects are distinctive yet consistent, and quantifying linguistic contributions to style variation.

Contribution

It presents a novel neural method for learning and analyzing idiolects, demonstrating their regularities and the impact of linguistic elements on individual styles.

Findings

01

Neural models effectively identify authors from short texts.

02

Idiolects show consistent yet distinctive stylistic features.

03

Linguistic perturbation quantifies contributions to style variation.

Abstract

An individual's variation in writing style is often a function of both social and personal attributes. While structured social variation has been extensively studied, e.g., gender based variation, far less is known about how to characterize individual styles due to their idiosyncratic nature. We introduce a new approach to studying idiolects through a massive cross-author comparison to identify and encode stylistic features. The neural model achieves strong performance at authorship identification on short texts and through an analogy-based probing task, showing that the learned representations exhibit surprising regularities that encode qualitative and quantitative shifts of idiolectal styles. Through text perturbation, we quantify the relative contributions of different linguistic elements to idiolectal variation. Furthermore, we provide a description of idiolects through measuring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lingjzhu/idiolect
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Natural Language Processing Techniques · Topic Modeling