BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media Texts
Raisa Tasnim, Mehanaz Chowdhury, Md Ataur Rahman

TL;DR
This paper introduces BN-AuthProf, a new dataset for Bangla social media texts, benchmarks machine learning methods for author profiling, and demonstrates promising results in gender and age classification.
Contribution
It creates and evaluates the first comprehensive Bangla author profiling dataset and benchmarks multiple machine learning techniques for demographic prediction.
Findings
Support Vector Machine achieved 80% accuracy in gender classification.
Multinomial Naive Bayes achieved 91% accuracy in age classification.
The study demonstrates machine learning's effectiveness for Bangla author profiling.
Abstract
Author profiling, the analysis of texts to uncover attributes such as gender and age of the author, has become essential with the widespread use of social media platforms. This paper focuses on author profiling in the Bangla language, aiming to extract valuable insights about anonymous authors based on their writing style on social media. The primary objective is to introduce and benchmark the performance of machine learning approaches on a newly created Bangla Author Profiling dataset, BN-AuthProf. The dataset comprises 30,131 social media posts from 300 authors, labeled by their age and gender. Authors' identities and sensitive information were anonymized to ensure privacy. Various classical machine learning and deep learning techniques were employed to evaluate the dataset. For gender classification, the best accuracy achieved was 80% using Support Vector Machine (SVM), while a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Natural Language Processing Techniques
