Leveraging Discourse Information Effectively for Authorship Attribution

Su Wang; Elisa Ferracane; Raymond J. Mooney

arXiv:1709.02271·cs.CL·September 8, 2017·6 cites

Leveraging Discourse Information Effectively for Authorship Attribution

Su Wang, Elisa Ferracane, Raymond J. Mooney

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel CNN-based method that effectively incorporates discourse features for authorship attribution, achieving state-of-the-art results and providing insights into when discourse features improve performance.

Contribution

It presents a new approach to embed discourse features in neural classifiers and analyzes their impact on authorship attribution accuracy.

Findings

01

Discourse embeddings significantly improve attribution accuracy in certain conditions.

02

The proposed method outperforms previous state-of-the-art models.

03

Featurization methods influence the effectiveness of discourse information.

Abstract

We explore techniques to maximize the effectiveness of discourse information in the task of authorship attribution. We present a novel method to embed discourse features in a Convolutional Neural Network text classifier, which achieves a state-of-the-art result by a substantial margin. We empirically investigate several featurization methods to understand the conditions under which discourse features contribute non-trivial performance gains, and analyze discourse embeddings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elisaF/authorship-attribution-discourse
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Topic Modeling · Hate Speech and Cyberbullying Detection