Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Rahul Ponnusamy,, Prasanna Kumar Kumaresan, Kayalvizhi Sampath, Durairaj Thenmozhi, Sathiyaraj, Thangasamy, Rajendran Nallathambi, John Phillip McCrae

TL;DR
This paper introduces a new multilingual dataset with expert annotations for identifying homophobic and transphobic comments on YouTube, along with baseline models, to aid automatic detection of hate speech against LGBT+ communities.
Contribution
It provides the first expert-labelled, multilingual dataset for online homophobia and transphobia detection, including a hierarchical taxonomy and baseline models.
Findings
15,141 annotated comments in the dataset
High inter-annotator agreement achieved
Baseline models demonstrate effective initial detection performance
Abstract
The increased proliferation of abusive content on social media platforms has a negative impact on online users. The dread, dislike, discomfort, or mistrust of lesbian, gay, transgender or bisexual persons is defined as homophobia/transphobia. Homophobic/transphobic speech is a type of offensive language that may be summarized as hate speech directed toward LGBT+ people, and it has been a growing concern in recent years. Online homophobia/transphobia is a severe societal problem that can make online platforms poisonous and unwelcome to LGBT+ people while also attempting to eliminate equality, diversity, and inclusion. We provide a new hierarchical taxonomy for online homophobia and transphobia, as well as an expert-labelled dataset that will allow homophobic/transphobic content to be automatically identified. We educated annotators and supplied them with comprehensive annotation rules…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Social Media and Politics · Media Influence and Politics
