Learning Stylometric Representations for Authorship Analysis

Steven H. H. Ding; Benjamin C. M. Fung; Farkhund Iqbal; William K.; Cheung

arXiv:1606.01219·cs.CL·June 6, 2016

Learning Stylometric Representations for Authorship Analysis

Steven H. H. Ding, Benjamin C. M. Fung, Farkhund Iqbal, William K., Cheung

PDF

TL;DR

This paper introduces a neural network-based method that combines various linguistic features to learn stylometric representations for authorship analysis, outperforming traditional text representation techniques.

Contribution

It proposes a novel approach integrating topical, lexical, syntactical, and character-level features into distributed representations for authorship tasks.

Findings

01

Outperforms bag-of-lexical-n-grams and other embedding methods

02

Effective on Twitter, novel, and essay datasets

03

Improves accuracy in authorship characterization and verification

Abstract

Authorship analysis (AA) is the study of unveiling the hidden properties of authors from a body of exponentially exploding textual data. It extracts an author's identity and sociolinguistic characteristics based on the reflected writing styles in the text. It is an essential process for various areas, such as cybercrime investigation, psycholinguistics, political socialization, etc. However, most of the previous techniques critically depend on the manual feature engineering process. Consequently, the choice of feature set has been shown to be scenario- or dataset-dependent. In this paper, to mimic the human sentence composition process using a neural network approach, we propose to incorporate different categories of linguistic features into distributed representation of words in order to learn simultaneously the writing style representations based on unlabeled texts for authorship…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.