CUSATNLP@HASOC-Dravidian-CodeMix-FIRE2020:Identifying Offensive Language from ManglishTweets
Sara Renjit, Sumam Mary Idicula

TL;DR
This paper presents a model for identifying offensive language in code-mixed Manglish tweets, addressing the challenge of mixed-language social media content using embedding-based classification.
Contribution
It introduces a novel approach applying embedding models to classify offensive content in code-mixed Dravidian language tweets, specifically for the HASOC-Dravidian-CodeMix task.
Findings
Effective identification of offensive language in Manglish tweets
Demonstrated the utility of embedding models for code-mixed language classification
Achieved competitive results in the HASOC-Dravidian-CodeMix task
Abstract
With the popularity of social media, communications through blogs, Facebook, Twitter, and other plat-forms have increased. Initially, English was the only medium of communication. Fortunately, now we can communicate in any language. It has led to people using English and their own native or mother tongue language in a mixed form. Sometimes, comments in other languages have English transliterated format or other cases; people use the intended language scripts. Identifying sentiments and offensive content from such code mixed tweets is a necessary task in these times. We present a working model submitted for Task2 of the sub-track HASOC Offensive Language Identification- DravidianCodeMix in Forum for Information Retrieval Evaluation, 2020. It is a message level classification task. An embedding model-based classifier identifies offensive and not offensive comments in our approach. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Swearing, Euphemism, Multilingualism
