CUSATNLP@HASOC-Dravidian-CodeMix-FIRE2020:Identifying Offensive Language   from ManglishTweets

Sara Renjit; Sumam Mary Idicula

arXiv:2010.08756·cs.CL·October 20, 2020·5 cites

CUSATNLP@HASOC-Dravidian-CodeMix-FIRE2020:Identifying Offensive Language from ManglishTweets

Sara Renjit, Sumam Mary Idicula

PDF

Open Access

TL;DR

This paper presents a model for identifying offensive language in code-mixed Manglish tweets, addressing the challenge of mixed-language social media content using embedding-based classification.

Contribution

It introduces a novel approach applying embedding models to classify offensive content in code-mixed Dravidian language tweets, specifically for the HASOC-Dravidian-CodeMix task.

Findings

01

Effective identification of offensive language in Manglish tweets

02

Demonstrated the utility of embedding models for code-mixed language classification

03

Achieved competitive results in the HASOC-Dravidian-CodeMix task

Abstract

With the popularity of social media, communications through blogs, Facebook, Twitter, and other plat-forms have increased. Initially, English was the only medium of communication. Fortunately, now we can communicate in any language. It has led to people using English and their own native or mother tongue language in a mixed form. Sometimes, comments in other languages have English transliterated format or other cases; people use the intended language scripts. Identifying sentiments and offensive content from such code mixed tweets is a necessary task in these times. We present a working model submitted for Task2 of the sub-track HASOC Offensive Language Identification- DravidianCodeMix in Forum for Information Retrieval Evaluation, 2020. It is a message level classification task. An embedding model-based classifier identifies offensive and not offensive comments in our approach. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Swearing, Euphemism, Multilingualism