Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media
Ananya Srivastava, Mohammed Hasan, Bhargav Yagnik, Rahee Walambe and, Ketan Kotecha

TL;DR
This paper presents a novel AI-based approach for detecting hate speech in Hinglish social media data using contextual embeddings, significantly improving detection accuracy over existing methods.
Contribution
It introduces a fine-tuning methodology employing ELMo, FLAIR, and BERT for Hinglish hate speech detection, addressing the challenge of code-mixed language analysis.
Findings
Proposed model outperforms existing methods in accuracy
Utilizes contextual embeddings for better language understanding
Effective on multiple Hinglish datasets
Abstract
Social networking platforms provide a conduit to disseminate our ideas, views and thoughts and proliferate information. This has led to the amalgamation of English with natively spoken languages. Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world. Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages. Thus, the worldwide hate speech detection rate of around 44% drops even more considering the content in Indian colloquial languages and slangs. In this paper, we propose a methodology for efficient detection of unstructured code-mix Hinglish language. Fine-tuning based approaches for Hindi-English code-mixed language are employed by utilizing contextual based embeddings such as ELMo (Embeddings for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting · Swearing, Euphemism, Multilingualism
MethodsAttention Is All You Need · Linear Layer · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Layer Normalization · Linear Warmup With Linear Decay · Multi-Head Attention · Residual Connection · WordPiece
