Leveraging Discourse Information Effectively for Authorship Attribution
Su Wang, Elisa Ferracane, Raymond J. Mooney

TL;DR
This paper introduces a novel CNN-based method that effectively incorporates discourse features for authorship attribution, achieving state-of-the-art results and providing insights into when discourse features improve performance.
Contribution
It presents a new approach to embed discourse features in neural classifiers and analyzes their impact on authorship attribution accuracy.
Findings
Discourse embeddings significantly improve attribution accuracy in certain conditions.
The proposed method outperforms previous state-of-the-art models.
Featurization methods influence the effectiveness of discourse information.
Abstract
We explore techniques to maximize the effectiveness of discourse information in the task of authorship attribution. We present a novel method to embed discourse features in a Convolutional Neural Network text classifier, which achieves a state-of-the-art result by a substantial margin. We empirically investigate several featurization methods to understand the conditions under which discourse features contribute non-trivial performance gains, and analyze discourse embeddings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Hate Speech and Cyberbullying Detection
