Offense Detection in Dravidian Languages using Code-Mixing Index based   Focal Loss

Debapriya Tula; Shreyas MS; Viswanatha Reddy; Pranjal Sahu; Sumanth; Doddapaneni; Prathyush Potluri; Rohan Sukumaran; Parth Patwa

arXiv:2111.06916·cs.CL·May 9, 2022

Offense Detection in Dravidian Languages using Code-Mixing Index based Focal Loss

Debapriya Tula, Shreyas MS, Viswanatha Reddy, Pranjal Sahu, Sumanth, Doddapaneni, Prathyush Potluri, Rohan Sukumaran, Parth Patwa

PDF

Open Access

TL;DR

This paper presents a novel approach for offensive language detection in Dravidian languages that effectively handles code-mixing, class imbalance, multilingualism, and mixed scripts using a new CMI-based focal loss and cosine classifier.

Contribution

The paper introduces a CMI-based focal loss and cosine classifier tailored for offensive detection in low-resource, code-mixed, and multilingual Dravidian languages, addressing key challenges.

Findings

01

Improved detection accuracy with the proposed CMI focal loss

02

Effective handling of code-mixed and mixed-script instances

03

Enhanced performance in low-resource multilingual settings

Abstract

Over the past decade, we have seen exponential growth in online content fueled by social media platforms. Data generation of this scale comes with the caveat of insurmountable offensive content in it. The complexity of identifying offensive content is exacerbated by the usage of multiple modalities (image, language, etc.), code-mixed language and more. Moreover, even after careful sampling and annotation of offensive content, there will always exist a significant class imbalance between offensive and non-offensive content. In this paper, we introduce a novel Code-Mixing Index (CMI) based focal loss which circumvents two challenges (1) code-mixing in languages (2) class imbalance problem for Dravidian language offense detection. We also replace the conventional dot product-based classifier with the cosine-based classifier which results in a boost in performance. Further, we use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques

MethodsFocal Loss · Network On Network