Transfer Language Selection for Zero-Shot Cross-Lingual Abusive Language Detection
Juuso Eronen, Michal Ptaszynski, Fumito Masui, Masaki Arata, Gniewosz, Leliwa, Michal Wroczynski

TL;DR
This paper explores how selecting linguistically similar transfer languages enhances zero-shot abusive language detection across multiple languages, leveraging cross-lingual transfer learning without needing datasets for each language.
Contribution
It demonstrates that linguistic similarity metrics can guide the selection of effective transfer languages for improved zero-shot abusive language detection.
Findings
Linguistic similarity correlates with classifier performance.
Optimal transfer languages improve detection accuracy.
Cross-lingual transfer reduces data requirements for low-resource languages.
Abstract
We study the selection of transfer languages for automatic abusive language detection. Instead of preparing a dataset for every language, we demonstrate the effectiveness of cross-lingual transfer learning for zero-shot abusive language detection. This way we can use existing data from higher-resource languages to build better detection systems for low-resource languages. Our datasets are from seven different languages from three language families. We measure the distance between the languages using several language similarity measures, especially by quantifying the World Atlas of Language Structures. We show that there is a correlation between linguistic similarity and classifier performance. This discovery allows us to choose an optimal transfer language for zero shot abusive language detection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
