KIT-TIP-NLP at MultiPride: Continual Learning with Multilingual Foundation Model

Barathi Ganesh HB; Michal Ptaszynski; Rene Melendez; Juuso Eronen

arXiv:2605.13415·cs.CL·May 19, 2026

KIT-TIP-NLP at MultiPride: Continual Learning with Multilingual Foundation Model

Barathi Ganesh HB, Michal Ptaszynski, Rene Melendez, Juuso Eronen

PDF

1 Repo

TL;DR

This paper introduces a comprehensive multilingual framework for detecting reclaimed slurs in social media, combining data augmentation, transfer learning, and threshold optimization to improve accuracy across languages.

Contribution

It develops a novel multi-stage approach integrating augmentation, transfer learning, and threshold tuning, with systematic evaluation of multilingual embedding models.

Findings

01

XLM-RoBERTa was identified as the best foundation model.

02

Back-translation tripled training data while maintaining semantics.

03

Threshold optimization improved F1 scores by 2-5%.

Abstract

This paper presents a multi-stage framework for detecting reclaimed slurs in multilingual social media discourse. It addresses the challenge of identifying reclamatory versus non-reclamatory usage of LGBTQ+-related slurs across English, Spanish, and Italian tweets. The framework handles three intertwined methodological challenges like data scarcity, class imbalance, and cross-linguistic variation in sentiment expression. It integrates data-driven model selection via cross-validation, semantic-preserving augmentation through back-translation, inductive transfer learning with dynamic epoch-level undersampling, and domain-specific knowledge injection via masked language modeling. Eight multilingual embedding models were evaluated systematically, with XLM-RoBERTa selected as the foundation model based on macro-averaged F1 score. Data augmentation via GPT-4o-mini back-translation to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rbg-research/MultiPRIDE-Evalita-2026
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.