Integrating Bidirectional Long Short-Term Memory with Subword Embedding for Authorship Attribution
Abiodun Modupe, Turgay Celik, Vukosi Marivate, Oludayo O. Olugbara

TL;DR
This paper proposes a novel deep learning approach combining BLSTM and CNN with subword embeddings to improve authorship attribution by capturing sequential and stylistic features more effectively.
Contribution
It introduces a hybrid BLSTM-CNN model with subword embeddings to better address hidden word ambiguity and sequential context in authorship attribution.
Findings
Achieved over 1% accuracy improvement on CCAT50 and Twitter datasets.
Produced comparable results on IMDb62 and Blog50 datasets.
Demonstrated the effectiveness of combining BLSTM and CNN for stylistic feature extraction.
Abstract
The problem of unveiling the author of a given text document from multiple candidate authors is called authorship attribution. Manifold word-based stylistic markers have been successfully used in deep learning methods to deal with the intrinsic problem of authorship attribution. Unfortunately, the performance of word-based authorship attribution systems is limited by the vocabulary of the training corpus. Literature has recommended character-based stylistic markers as an alternative to overcome the hidden word problem. However, character-based methods often fail to capture the sequential relationship of words in texts which is a chasm for further improvement. The question addressed in this paper is whether it is possible to address the ambiguity of hidden words in text documents while preserving the sequential context of words. Consequently, a method based on bidirectional long…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Topic Modeling
Methodsfail
