cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media   Comments using Spatio-Temporally Retrained Language Models

Sidney G.-J. Wong; Matthew Durward; Benjamin Adams; Jonathan Dunn

arXiv:2308.10370·cs.CL·August 28, 2023

cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models

Sidney G.-J. Wong, Matthew Durward, Benjamin Adams, Jonathan Dunn

PDF

Open Access

TL;DR

This paper presents a multilingual transformer-based system for detecting homophobia and transphobia in social media comments, demonstrating that spatio-temporal retraining improves classification accuracy across five languages.

Contribution

It introduces a spatio-temporally retrained multilingual BERT-based model for hate speech detection, with the best Malayalam classifier achieving top performance in the shared task.

Findings

01

Spatio-temporal retraining improves classification performance across languages.

02

The Malayalam classifier achieved the highest macro F1 score among participants.

03

Transformer models are sensitive to register-specific and language-specific retraining.

Abstract

This paper describes our multiclass classification system developed as part of the LTEDI@RANLP-2023 shared task. We used a BERT-based language model to detect homophobic and transphobic content in social media comments across five language conditions: English, Spanish, Hindi, Malayalam, and Tamil. We retrained a transformer-based crosslanguage pretrained language model, XLMRoBERTa, with spatially and temporally relevant social media language data. We also retrained a subset of models with simulated script-mixed social media language data with varied performance. We developed the best performing seven-label classification system for Malayalam based on weighted macro averaged F1 score (ranked first out of six) with variable performance for other language and class-label conditions. We found the inclusion of this spatio-temporal data improved the classification performance for all language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Text Readability and Simplification · Natural Language Processing Techniques