Inference of Fine-grained Attributes of Bengali Corpus for Stylometry Detection
Tanmoy Chakraborty, Sivaji Bandyopadhyay

TL;DR
This paper introduces a fully automatic, language-independent method for Bengali stylometry detection using fine-grained attributes, lexical markers, and semi-supervised measures, achieving promising accuracy.
Contribution
It proposes a novel approach for Bengali stylometry detection employing fine-grained features and semi-supervised measures, advancing language-independent authorship analysis.
Findings
Achieved promising accuracy compared to baseline models.
Developed a language-independent, fully automatic stylometry system.
Utilized a combination of lexical markers and semi-supervised decision measures.
Abstract
Stylometry, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and belongs to the core task of Text categorization that involves authorship identification, plagiarism detection, forensic investigation, computer security, copyright and estate disputes etc. In this work, we present a strategy for stylometry detection of documents written in Bengali. We adopt a set of fine-grained attribute features with a set of lexical markers for the analysis of the text and use three semi-supervised measures for making decisions. Finally, a majority voting approach has been taken for final classification. The system is fully automatic and language-independent. Evaluation results of our attempt for Bengali author's stylometry detection show reasonably promising accuracy in comparison to the baseline model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Hate Speech and Cyberbullying Detection
