Mind Your Language: Abuse and Offense Detection for Code-Switched Languages
Raghav Kapoor, Yaman Kumar, Kshitij Rajput, Rajiv Ratn Shah,, Ponnurangam Kumaraguru, Roger Zimmermann

TL;DR
This paper develops a transfer learning-based LSTM model for detecting hate speech in Hinglish, a popular code-switched language, achieving state-of-the-art performance and providing resources for future research.
Contribution
It introduces the first effective model for offensive language detection in Hinglish, addressing challenges of non-standard grammar and vocabulary.
Findings
Model surpasses existing approaches in accuracy.
Achieves state-of-the-art results in Hinglish offensive text classification.
Provides publicly available model and embeddings for research.
Abstract
In multilingual societies like the Indian subcontinent, use of code-switched languages is much popular and convenient for the users. In this paper, we study offense and abuse detection in the code-switched pair of Hindi and English (i.e. Hinglish), the pair that is the most spoken. The task is made difficult due to non-fixed grammar, vocabulary, semantics and spellings of Hinglish language. We apply transfer learning and make a LSTM based model for hate speech classification. This model surpasses the performance shown by the current best models to establish itself as the state-of-the-art in the unexplored domain of Hinglish offensive text classification.We also release our model and the embeddings trained for research purposes
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
