Integrating Bidirectional Long Short-Term Memory with Subword Embedding   for Authorship Attribution

Abiodun Modupe; Turgay Celik; Vukosi Marivate; Oludayo O. Olugbara

arXiv:2306.14933·cs.CL·June 28, 2023

Integrating Bidirectional Long Short-Term Memory with Subword Embedding for Authorship Attribution

Abiodun Modupe, Turgay Celik, Vukosi Marivate, Oludayo O. Olugbara

PDF

Open Access

TL;DR

This paper proposes a novel deep learning approach combining BLSTM and CNN with subword embeddings to improve authorship attribution by capturing sequential and stylistic features more effectively.

Contribution

It introduces a hybrid BLSTM-CNN model with subword embeddings to better address hidden word ambiguity and sequential context in authorship attribution.

Findings

01

Achieved over 1% accuracy improvement on CCAT50 and Twitter datasets.

02

Produced comparable results on IMDb62 and Blog50 datasets.

03

Demonstrated the effectiveness of combining BLSTM and CNN for stylistic feature extraction.

Abstract

The problem of unveiling the author of a given text document from multiple candidate authors is called authorship attribution. Manifold word-based stylistic markers have been successfully used in deep learning methods to deal with the intrinsic problem of authorship attribution. Unfortunately, the performance of word-based authorship attribution systems is limited by the vocabulary of the training corpus. Literature has recommended character-based stylistic markers as an alternative to overcome the hidden word problem. However, character-based methods often fail to capture the sequential relationship of words in texts which is a chasm for further improvement. The question addressed in this paper is whether it is possible to address the ambiguity of hidden words in text documents while preserving the sequential context of words. Consequently, a method based on bidirectional long…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Topic Modeling

Methodsfail