Optimize_Prime@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil
Shantanu Patankar, Omkar Gokhale, Onkar Litake, Aditya Mandke, Dipali, Kadam

TL;DR
This paper addresses abusive comment detection in Tamil and code-mixed Tamil-English social media comments using ensemble, RNN, and Transformer models, achieving macro F1 scores around 0.43-0.45.
Contribution
It introduces an approach combining ensemble, RNN, and Transformer models for abusive comment detection in low-resource Tamil and code-mixed data, with state-of-the-art results.
Findings
MuRIL and XLM-RoBERTA achieved 0.43 F1 score on Tamil data.
MuRIL and M-BERT achieved 0.45 F1 score on code-mixed data.
Ensemble and Transformer models improve detection accuracy.
Abstract
This paper tries to address the problem of abusive comment detection in low-resource indic languages. Abusive comments are statements that are offensive to a person or a group of people. These comments are targeted toward individuals belonging to specific ethnicities, genders, caste, race, sexuality, etc. Abusive Comment Detection is a significant problem, especially with the recent rise in social media users. This paper presents the approach used by our team - Optimize_Prime, in the ACL 2022 shared task "Abusive Comment Detection in Tamil." This task detects and classifies YouTube comments in Tamil and Tamil- English Codemixed format into multiple categories. We have used three methods to optimize our results: Ensemble models, Recurrent Neural Networks, and Transformers. In the Tamil data, MuRIL and XLM-RoBERTA were our best performing models with a macro-averaged f1 score of 0.43.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining
