Offense Detection in Dravidian Languages using Code-Mixing Index based Focal Loss
Debapriya Tula, Shreyas MS, Viswanatha Reddy, Pranjal Sahu, Sumanth, Doddapaneni, Prathyush Potluri, Rohan Sukumaran, Parth Patwa

TL;DR
This paper presents a novel approach for offensive language detection in Dravidian languages that effectively handles code-mixing, class imbalance, multilingualism, and mixed scripts using a new CMI-based focal loss and cosine classifier.
Contribution
The paper introduces a CMI-based focal loss and cosine classifier tailored for offensive detection in low-resource, code-mixed, and multilingual Dravidian languages, addressing key challenges.
Findings
Improved detection accuracy with the proposed CMI focal loss
Effective handling of code-mixed and mixed-script instances
Enhanced performance in low-resource multilingual settings
Abstract
Over the past decade, we have seen exponential growth in online content fueled by social media platforms. Data generation of this scale comes with the caveat of insurmountable offensive content in it. The complexity of identifying offensive content is exacerbated by the usage of multiple modalities (image, language, etc.), code-mixed language and more. Moreover, even after careful sampling and annotation of offensive content, there will always exist a significant class imbalance between offensive and non-offensive content. In this paper, we introduce a novel Code-Mixing Index (CMI) based focal loss which circumvents two challenges (1) code-mixing in languages (2) class imbalance problem for Dravidian language offense detection. We also replace the conventional dot product-based classifier with the cosine-based classifier which results in a boost in performance. Further, we use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques
MethodsFocal Loss · Network On Network
