Bangla Hate Speech Classification with Fine-tuned Transformer Models
Yalda Keivan Jafari, Krishno Dey

TL;DR
This paper evaluates transformer-based models for Bangla hate speech detection, demonstrating that language-specific pre-trained models like BanglaBERT outperform multilingual models and traditional baselines, highlighting the importance of tailored resources for low-resource languages.
Contribution
It introduces the application of transformer models, especially BanglaBERT, for hate speech classification in Bangla, and compares their performance against baselines, emphasizing the value of language-specific pre-training.
Findings
BanglaBERT outperforms other transformer models and baselines.
Transformer models generally outperform traditional machine learning methods.
Language-specific pre-training significantly improves hate speech detection in Bangla.
Abstract
Hate speech recognition in low-resource languages remains a difficult problem due to insufficient datasets, orthographic heterogeneity, and linguistic variety. Bangla is spoken by more than 230 million people of Bangladesh and India (West Bengal). Despite the growing need for automated moderation on social media platforms, Bangla is significantly under-represented in computational resources. In this work, we study Subtask 1A and Subtask 1B of the BLP 2025 Shared Task on hate speech detection. We reproduce the official baselines (e.g., Majority, Random, Support Vector Machine) and also produce and consider Logistic Regression, Random Forest, and Decision Tree as baseline methods. We also utilized transformer-based models such as DistilBERT, BanglaBERT, m-BERT, and XLM-RoBERTa for hate speech classification. All the transformer-based models outperformed baseline methods for the subtasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Emotion and Mood Recognition
