cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in   Under-resourced Languages

Sidney G.-J. Wong; Matthew Durward

arXiv:2401.15777·cs.CL·January 30, 2024·1 cites

cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in Under-resourced Languages

Sidney G.-J. Wong, Matthew Durward

PDF

Open Access

TL;DR

This paper presents a transformer-based system for detecting anti-LGBTQ+ hate speech across ten under-resourced languages, incorporating script-switching data to improve multilingual social media comment classification.

Contribution

The study introduces a multilingual classification model with domain adaptation using synthetic and organic script-switched data for under-resourced languages.

Findings

01

Ranked second for Gujarati and Telugu

02

Script-switching data improves language detection

03

Performance varies across different languages

Abstract

This paper describes our homophobia/transphobia in social media comments detection system developed as part of the shared task at LT-EDI-2024. We took a transformer-based approach to develop our multiclass classification model for ten language conditions (English, Spanish, Gujarati, Hindi, Kannada, Malayalam, Marathi, Tamil, Tulu, and Telugu). We introduced synthetic and organic instances of script-switched language data during domain adaptation to mirror the linguistic realities of social media language as seen in the labelled training data. Our system ranked second for Gujarati and Telugu with varying levels of performance for other language conditions. The results suggest incorporating elements of paralinguistic behaviour such as script-switching may improve the performance of language detection systems especially in the cases of under-resourced languages conditions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection