Mind Your Language: Abuse and Offense Detection for Code-Switched   Languages

Raghav Kapoor; Yaman Kumar; Kshitij Rajput; Rajiv Ratn Shah,; Ponnurangam Kumaraguru; Roger Zimmermann

arXiv:1809.08652·cs.CL·September 25, 2018

Mind Your Language: Abuse and Offense Detection for Code-Switched Languages

Raghav Kapoor, Yaman Kumar, Kshitij Rajput, Rajiv Ratn Shah,, Ponnurangam Kumaraguru, Roger Zimmermann

PDF

TL;DR

This paper develops a transfer learning-based LSTM model for detecting hate speech in Hinglish, a popular code-switched language, achieving state-of-the-art performance and providing resources for future research.

Contribution

It introduces the first effective model for offensive language detection in Hinglish, addressing challenges of non-standard grammar and vocabulary.

Findings

01

Model surpasses existing approaches in accuracy.

02

Achieves state-of-the-art results in Hinglish offensive text classification.

03

Provides publicly available model and embeddings for research.

Abstract

In multilingual societies like the Indian subcontinent, use of code-switched languages is much popular and convenient for the users. In this paper, we study offense and abuse detection in the code-switched pair of Hindi and English (i.e. Hinglish), the pair that is the most spoken. The task is made difficult due to non-fixed grammar, vocabulary, semantics and spellings of Hinglish language. We apply transfer learning and make a LSTM based model for hate speech classification. This model surpasses the performance shown by the current best models to establish itself as the state-of-the-art in the unexplored domain of Hinglish offensive text classification.We also release our model and the embeddings trained for research purposes

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory