On Importance of Code-Mixed Embeddings for Hate Speech Identification
Shruti Jagdale, Omkar Khade, Gauri Takalikar, Mihir Inamdar, Raviraj, Joshi

TL;DR
This paper investigates the importance of code-mixed embeddings for hate speech detection, showing that HingBERT and Hing-FastText models trained on Hindi-English data outperform standard models on code-mixed hate speech datasets.
Contribution
It demonstrates the effectiveness of code-mixed embeddings and models trained on multilingual data for improving hate speech detection accuracy.
Findings
HingBERT outperforms BERT on hate speech detection in code-mixed data.
Hing-FastText surpasses standard FastText and vanilla BERT models.
Training on extensive Hindi-English data enhances model performance.
Abstract
Code-mixing is the practice of using two or more languages in a single sentence, which often occurs in multilingual communities such as India where people commonly speak multiple languages. Classic NLP tools, trained on monolingual data, face challenges when dealing with code-mixed data. Extracting meaningful information from sentences containing multiple languages becomes difficult, particularly in tasks like hate speech detection, due to linguistic variation, cultural nuances, and data sparsity. To address this, we aim to analyze the significance of code-mixed embeddings and evaluate the performance of BERT and HingBERT models (trained on a Hindi-English corpus) in hate speech detection. Our study demonstrates that HingBERT models, benefiting from training on the extensive Hindi-English dataset L3Cube-HingCorpus, outperform BERT models when tested on hate speech text datasets. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Layer Normalization · Adam · Residual Connection · Weight Decay · Softmax · Multi-Head Attention
