Hate Speech detection in the Bengali language: A dataset and its baseline evaluation
Nauros Romim, Mosahed Ahmed, Hriteshwar Talukder, Md Saiful Islam

TL;DR
This paper introduces a new Bengali hate speech dataset from social media comments, along with baseline evaluations of various deep learning models, to advance research in hate speech detection for this language.
Contribution
The paper provides the first large-scale, annotated Bengali hate speech dataset and baseline results using multiple models, addressing the lack of resources for this language.
Findings
SVM achieved 87.5% accuracy in baseline tests.
Deep learning models performed well on the dataset.
The dataset covers seven categories of comments.
Abstract
Social media sites such as YouTube and Facebook have become an integral part of everyone's life and in the last few years, hate speech in the social media comment section has increased rapidly. Detection of hate speech on social media websites faces a variety of challenges including small imbalanced data sets, the findings of an appropriate model and also the choice of feature analysis method. further more, this problem is more severe for the Bengali speaking community due to the lack of gold standard labelled datasets. This paper presents a new dataset of 30,000 user comments tagged by crowd sourcing and varified by experts. All the comments are collected from YouTube and Facebook comment section and classified into seven categories: sports, entertainment, religion, politics, crime, celebrity and TikTok & meme. A total of 50 annotators annotated each comment three times and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting · Advanced Malware Detection Techniques
MethodsSupport Vector Machine · fastText
