Improving code-mixed hate detection by native sample mixing: A case study for Hindi-English code-mixed scenario
Debajyoti Mazumder, Aakash Kumar, Jasabanta Patro

TL;DR
This study explores enhancing Hindi-English code-mixed hate speech detection by incorporating native language hate samples into training, demonstrating that even small native sample additions significantly improve model performance and focus on hate words.
Contribution
The paper introduces a novel native sample mixing approach for code-mixed hate detection and provides empirical evidence of its effectiveness using multilingual language models.
Findings
Adding native hate samples improves MLM performance on code-mixed hate detection.
MLMs trained solely on native samples can effectively detect code-mixed hate.
Native sample inclusion helps models focus on hate-emitting words in code-mixed contexts.
Abstract
Hate detection has long been a challenging task for the NLP community. The task becomes complex in a code-mixed environment because the models must understand the context and the hate expressed through language alteration. Compared to the monolingual setup, we see much less work on code-mixed hate as large-scale annotated hate corpora are unavailable for the study. To overcome this bottleneck, we propose using native language hate samples (native language samples/ native samples hereafter). We hypothesise that in the era of multilingual language models (MLMs), hate in code-mixed settings can be detected by majorly relying on the native language samples. Even though the NLP literature reports the effectiveness of MLMs on hate detection in many cross-lingual settings, their extensive evaluation in a code-mixed scenario is yet to be done. This paper attempts to fill this gap through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Bullying, Victimization, and Aggression · Advanced Malware Detection Techniques
MethodsFocus
