Predicting the Type and Target of Offensive Social Media Posts in   Marathi

Marcos Zampieri; Tharindu Ranasinghe; Mrinal Chaudhari; Saurabh; Gaikwad; Prajwal Krishna; Mayuresh Nene; Shrunali Paygude

arXiv:2211.12570·cs.CL·November 24, 2022

Predicting the Type and Target of Offensive Social Media Posts in Marathi

Marcos Zampieri, Tharindu Ranasinghe, Mrinal Chaudhari, Saurabh, Gaikwad, Prajwal Krishna, Mayuresh Nene, Shrunali Paygude

PDF

Open Access 1 Repo

TL;DR

This paper introduces MOLD 2.0, a comprehensive hierarchical dataset for offensive language detection in Marathi, a low-resource language, and demonstrates experiments to improve automatic recognition of offensive content.

Contribution

It presents the first hierarchical offensive language dataset for Marathi and expands research into low-resource Indo-Aryan languages using semi-supervised annotation methods.

Findings

01

MOLD 2.0 is larger and more detailed than previous datasets.

02

Hierarchical annotation improves offensive language classification.

03

Semi-supervised methods enhance dataset annotation quality.

Abstract

The presence of offensive language on social media is very common motivating platforms to invest in strategies to make communities safer. This includes developing robust machine learning systems capable of recognizing offensive content online. Apart from a few notable exceptions, most research on automatic offensive language identification has dealt with English and a few other high resource languages such as French, German, and Spanish. In this paper we address this gap by tackling offensive language identification in Marathi, a low-resource Indo-Aryan language spoken in India. We introduce the Marathi Offensive Language Dataset v.2.0 or MOLD 2.0 and present multiple experiments on this dataset. MOLD 2.0 is a much larger version of MOLD with expanded annotation to the levels B (type) and C (target) of the popular OLID taxonomy. MOLD 2.0 is the first hierarchical offensive language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tharindudr/mold
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection