BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling   on Social Media Texts

Raisa Tasnim; Mehanaz Chowdhury; Md Ataur Rahman

arXiv:2412.02058·cs.CL·December 4, 2024

BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media Texts

Raisa Tasnim, Mehanaz Chowdhury, Md Ataur Rahman

PDF

Open Access 1 Repo

TL;DR

This paper introduces BN-AuthProf, a new dataset for Bangla social media texts, benchmarks machine learning methods for author profiling, and demonstrates promising results in gender and age classification.

Contribution

It creates and evaluates the first comprehensive Bangla author profiling dataset and benchmarks multiple machine learning techniques for demographic prediction.

Findings

01

Support Vector Machine achieved 80% accuracy in gender classification.

02

Multinomial Naive Bayes achieved 91% accuracy in age classification.

03

The study demonstrates machine learning's effectiveness for Bangla author profiling.

Abstract

Author profiling, the analysis of texts to uncover attributes such as gender and age of the author, has become essential with the widespread use of social media platforms. This paper focuses on author profiling in the Bangla language, aiming to extract valuable insights about anonymous authors based on their writing style on social media. The primary objective is to introduce and benchmark the performance of machine learning approaches on a newly created Bangla Author Profiling dataset, BN-AuthProf. The dataset comprises 30,131 social media posts from 300 authors, labeled by their age and gender. Authors' identities and sensitive information were anonymized to ensure privacy. Various classical machine learning and deep learning techniques were employed to evaluate the dataset. For gender classification, the best accuracy achieved was 80% using Support Vector Machine (SVM), while a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

crusnic-corp/BN-AuthProf
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Natural Language Processing Techniques