Character-level and Multi-channel Convolutional Neural Networks for   Large-scale Authorship Attribution

Sebastian Ruder; Parsa Ghaffari; John G. Breslin

arXiv:1609.06686·cs.CL·September 22, 2016·87 cites

Character-level and Multi-channel Convolutional Neural Networks for Large-scale Authorship Attribution

Sebastian Ruder, Parsa Ghaffari, John G. Breslin

PDF

Open Access 3 Repos

TL;DR

This paper explores the use of character-level and multi-channel CNNs for large-scale authorship attribution, demonstrating their effectiveness across multiple datasets and introducing their application to Reddit.

Contribution

It introduces CNN-based methods for authorship attribution that outperform traditional approaches and applies them to new large-scale datasets including Reddit.

Findings

01

Character-level CNNs outperform state-of-the-art on four of five datasets.

02

Multi-channel CNNs effectively leverage word and character signals.

03

First application of authorship attribution to Reddit data.

Abstract

Convolutional neural networks (CNNs) have demonstrated superior capability for extracting information from raw signals in computer vision. Recently, character-level and multi-channel CNNs have exhibited excellent performance for sentence classification tasks. We apply CNNs to large-scale authorship attribution, which aims to determine an unknown text's author among many candidate authors, motivated by their ability to process character-level signals and to differentiate between a large number of classes, while making fast predictions in comparison to state-of-the-art approaches. We extensively evaluate CNN-based approaches that leverage word and character channels and compare them against state-of-the-art methods for a large range of author numbers, shedding new light on traditional approaches. We show that character-level CNNs outperform the state-of-the-art on four out of five…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Names, Identity, and Discrimination Research · Natural Language Processing Techniques