NLPineers@ NLU of Devanagari Script Languages 2025: Hate Speech   Detection using Ensembling of BERT-based models

Anmol Guragain; Nadika Poudel; Rajesh Piryani; Bishesh Khanal

arXiv:2412.08163·cs.CL·December 13, 2024

NLPineers@ NLU of Devanagari Script Languages 2025: Hate Speech Detection using Ensembling of BERT-based models

Anmol Guragain, Nadika Poudel, Rajesh Piryani, Bishesh Khanal

PDF

Open Access 2 Repos

TL;DR

This paper develops an ensemble of BERT-based models for hate speech detection in Hindi and Nepali, achieving competitive recall and F1 scores, and introduces data augmentation techniques to handle class imbalance.

Contribution

It introduces an ensemble approach of multilingual BERT models for hate speech detection in Devanagari languages, with novel data augmentation methods to improve performance.

Findings

01

Achieved recall of 0.7762 and F1 score of 0.6914.

02

Ensemble models outperformed individual models in detection accuracy.

03

Data augmentation via backtranslation improved class balance.

Abstract

This paper explores hate speech detection in Devanagari-scripted languages, focusing on Hindi and Nepali, for Subtask B of the CHIPSAL@COLING 2025 Shared Task. Using a range of transformer-based models such as XLM-RoBERTa, MURIL, and IndicBERT, we examine their effectiveness in navigating the nuanced boundary between hate speech and free expression. Our best performing model, implemented as ensemble of multilingual BERT models achieve Recall of 0.7762 (Rank 3/31 in terms of recall) and F1 score of 0.6914 (Rank 17/31). To address class imbalance, we used backtranslation for data augmentation, and cosine similarity to preserve label consistency after augmentation. This work emphasizes the need for hate speech detection in Devanagari-scripted languages and presents a foundation for further research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Softmax · Linear Layer · Linear Warmup With Linear Decay · Multi-Head Attention · Weight Decay · WordPiece · Layer Normalization · Residual Connection