Authorship Attribution in Bangla literature using Character-level CNN
Aisha Khatun, Anisur Rahman, Md. Saiful Islam, Marium-E-Jannat

TL;DR
This study explores character-level CNNs for authorship attribution in Bangla literature, demonstrating promising efficiency and accuracy improvements with pre-training, especially on larger datasets, despite some accuracy trade-offs compared to word-level models.
Contribution
The paper introduces a character-level CNN approach for Bangla authorship attribution, highlighting the benefits of pre-training and dataset size on performance.
Findings
Character-level CNNs are efficient and effective for Bangla authorship attribution.
Pre-training character embeddings improves accuracy by up to 10%.
Model performance increases with larger datasets.
Abstract
Characters are the smallest unit of text that can extract stylometric signals to determine the author of a text. In this paper, we investigate the effectiveness of character-level signals in Authorship Attribution of Bangla Literature and show that the results are promising but improvable. The time and memory efficiency of the proposed model is much higher than the word level counterparts but accuracy is 2-5% less than the best performing word-level models. Comparison of various word-based models is performed and shown that the proposed model performs increasingly better with larger datasets. We also analyze the effect of pre-training character embedding of diverse Bangla character set in authorship attribution. It is seen that the performance is improved by up to 10% on pre-training. We used 2 datasets from 6 to 14 authors, balancing them before training and compare the results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Natural Language Processing Techniques
