Dataset for Identification of Homophobia and Transophobia in   Multilingual YouTube Comments

Bharathi Raja Chakravarthi; Ruba Priyadharshini; Rahul Ponnusamy,; Prasanna Kumar Kumaresan; Kayalvizhi Sampath; Durairaj Thenmozhi; Sathiyaraj; Thangasamy; Rajendran Nallathambi; John Phillip McCrae

arXiv:2109.00227·cs.CL·September 2, 2021·56 cites

Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

Bharathi Raja Chakravarthi, Ruba Priyadharshini, Rahul Ponnusamy,, Prasanna Kumar Kumaresan, Kayalvizhi Sampath, Durairaj Thenmozhi, Sathiyaraj, Thangasamy, Rajendran Nallathambi, John Phillip McCrae

PDF

Open Access

TL;DR

This paper introduces a new multilingual dataset with expert annotations for identifying homophobic and transphobic comments on YouTube, along with baseline models, to aid automatic detection of hate speech against LGBT+ communities.

Contribution

It provides the first expert-labelled, multilingual dataset for online homophobia and transphobia detection, including a hierarchical taxonomy and baseline models.

Findings

01

15,141 annotated comments in the dataset

02

High inter-annotator agreement achieved

03

Baseline models demonstrate effective initial detection performance

Abstract

The increased proliferation of abusive content on social media platforms has a negative impact on online users. The dread, dislike, discomfort, or mistrust of lesbian, gay, transgender or bisexual persons is defined as homophobia/transphobia. Homophobic/transphobic speech is a type of offensive language that may be summarized as hate speech directed toward LGBT+ people, and it has been a growing concern in recent years. Online homophobia/transphobia is a severe societal problem that can make online platforms poisonous and unwelcome to LGBT+ people while also attempting to eliminate equality, diversity, and inclusion. We provide a new hierarchical taxonomy for online homophobia and transphobia, as well as an expert-labelled dataset that will allow homophobic/transphobic content to be automatically identified. We educated annotators and supplied them with comprehensive annotation rules…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Social Media and Politics · Media Influence and Politics