Character-level and Multi-channel Convolutional Neural Networks for Large-scale Authorship Attribution
Sebastian Ruder, Parsa Ghaffari, John G. Breslin

TL;DR
This paper explores the use of character-level and multi-channel CNNs for large-scale authorship attribution, demonstrating their effectiveness across multiple datasets and introducing their application to Reddit.
Contribution
It introduces CNN-based methods for authorship attribution that outperform traditional approaches and applies them to new large-scale datasets including Reddit.
Findings
Character-level CNNs outperform state-of-the-art on four of five datasets.
Multi-channel CNNs effectively leverage word and character signals.
First application of authorship attribution to Reddit data.
Abstract
Convolutional neural networks (CNNs) have demonstrated superior capability for extracting information from raw signals in computer vision. Recently, character-level and multi-channel CNNs have exhibited excellent performance for sentence classification tasks. We apply CNNs to large-scale authorship attribution, which aims to determine an unknown text's author among many candidate authors, motivated by their ability to process character-level signals and to differentiate between a large number of classes, while making fast predictions in comparison to state-of-the-art approaches. We extensively evaluate CNN-based approaches that leverage word and character channels and compare them against state-of-the-art methods for a large range of author numbers, shedding new light on traditional approaches. We show that character-level CNNs outperform the state-of-the-art on four out of five…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Names, Identity, and Discrimination Research · Natural Language Processing Techniques
