BIDWESH: A Bangla Regional Based Hate Speech Detection Dataset

Azizul Hakim Fayaz; MD. Shorif Uddin; Rayhan Uddin Bhuiyan; Zakia Sultana; Md. Samiul Islam; Bidyarthi Paul; Tashreef Muhammad; Shahriar Manzoor

arXiv:2507.16183·cs.CL·July 23, 2025

BIDWESH: A Bangla Regional Based Hate Speech Detection Dataset

Azizul Hakim Fayaz, MD. Shorif Uddin, Rayhan Uddin Bhuiyan, Zakia Sultana, Md. Samiul Islam, Bidyarthi Paul, Tashreef Muhammad, Shahriar Manzoor

PDF

Open Access

TL;DR

BIDWESH is a pioneering multi-dialect Bangla hate speech dataset that enhances detection of harmful content across regional dialects, addressing a critical gap in low-resource language NLP tools.

Contribution

This study introduces the first multi-dialectal Bangla hate speech dataset, translating and annotating over 9,000 instances across major regional dialects for improved detection.

Findings

01

Dataset covers three major Bangla dialects with 9,183 annotated instances.

02

Manual verification ensures linguistic and contextual accuracy.

03

Provides a resource for developing dialect-sensitive hate speech detection models.

Abstract

Hate speech on digital platforms has become a growing concern globally, especially in linguistically diverse countries like Bangladesh, where regional dialects play a major role in everyday communication. Despite progress in hate speech detection for standard Bangla, Existing datasets and systems fail to address the informal and culturally rich expressions found in dialects such as Barishal, Noakhali, and Chittagong. This oversight results in limited detection capability and biased moderation, leaving large sections of harmful content unaccounted for. To address this gap, this study introduces BIDWESH, the first multi-dialectal Bangla hate speech dataset, constructed by translating and annotating 9,183 instances from the BD-SHS corpus into three major regional dialects. Each entry was manually verified and labeled for hate presence, type (slander, gender, religion, call to violence),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection