Role of Artificial Intelligence in Detection of Hateful Speech for   Hinglish Data on Social Media

Ananya Srivastava; Mohammed Hasan; Bhargav Yagnik; Rahee Walambe and; Ketan Kotecha

arXiv:2105.04913·cs.CL·May 12, 2021

Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media

Ananya Srivastava, Mohammed Hasan, Bhargav Yagnik, Rahee Walambe and, Ketan Kotecha

PDF

Open Access

TL;DR

This paper presents a novel AI-based approach for detecting hate speech in Hinglish social media data using contextual embeddings, significantly improving detection accuracy over existing methods.

Contribution

It introduces a fine-tuning methodology employing ELMo, FLAIR, and BERT for Hinglish hate speech detection, addressing the challenge of code-mixed language analysis.

Findings

01

Proposed model outperforms existing methods in accuracy

02

Utilizes contextual embeddings for better language understanding

03

Effective on multiple Hinglish datasets

Abstract

Social networking platforms provide a conduit to disseminate our ideas, views and thoughts and proliferate information. This has led to the amalgamation of English with natively spoken languages. Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world. Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages. Thus, the worldwide hate speech detection rate of around 44% drops even more considering the content in Indian colloquial languages and slangs. In this paper, we propose a methodology for efficient detection of unstructured code-mix Hinglish language. Fine-tuning based approaches for Hindi-English code-mixed language are employed by utilizing contextual based embeddings such as ELMo (Embeddings for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting · Swearing, Euphemism, Multilingualism

MethodsAttention Is All You Need · Linear Layer · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Layer Normalization · Linear Warmup With Linear Decay · Multi-Head Attention · Residual Connection · WordPiece