bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for Detecting Homophobia and Transphobia in Social Media Comments
Vitthal Bhandari, Poonam Goyal

TL;DR
This paper presents a system using pretrained transformer models to detect homophobia and transphobia in social media comments, demonstrating effectiveness across multiple languages and addressing class imbalance.
Contribution
It introduces a multilingual transformer-based approach with data augmentation for offensive content detection in social media comments, applied to English and Tamil datasets.
Findings
Achieved top ranks in the shared task for English, Tamil, and mixed comments.
Demonstrated the effectiveness of pretrained models like mBERT on real-world social media data.
Showed improved performance with data augmentation techniques.
Abstract
Online social networks are ubiquitous and user-friendly. Nevertheless, it is vital to detect and moderate offensive content to maintain decency and empathy. However, mining social media texts is a complex task since users don't adhere to any fixed patterns. Comments can be written in any combination of languages and many of them may be low-resource. In this paper, we present our system for the LT-EDI shared task on detecting homophobia and transphobia in social media comments. We experiment with a number of monolingual and multilingual transformer based models such as mBERT along with a data augmentation technique for tackling class imbalance. Such pretrained large models have recently shown tremendous success on a variety of benchmark tasks in natural language processing. We observe their performance on a carefully annotated, real life dataset of YouTube comments in English as well…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Text Readability and Simplification
MethodsmBERT
