TL;DR
This paper systematically compares generalist, specialist, and ensemble strategies for multilingual polarization detection across 22 languages, proposing a language-adaptive framework that improves performance over standard models.
Contribution
It introduces a language-adaptive approach switching between models based on development performance, enhancing multilingual polarization detection accuracy.
Findings
A final macro F1 score of 0.796 was achieved.
Language-specific specialists outperform generalists on scripts with distinct characters.
Cross-lingual augmentation often underperformed native architecture selection.
Abstract
We present a systematic study of multilingual polarization detection across 22 languages for SemEval-2026 Task 9 (Subtask 1), contrasting multilingual generalists with language-specific specialists and hybrid ensembles. While a standard generalist like XLM-RoBERTa suffices when its tokenizer aligns with the target text, it may struggle with distinct scripts (e.g., Khmer, Odia) where monolingual specialists yield significant gains. Rather than enforcing a single universal architecture, we adopt a language-adaptive framework that switches between multilingual generalists, language-specific specialists, and hybrid ensembles based on development performance. Additionally, cross-lingual augmentation via NLLB-200 yielded mixed results, often underperforming native architecture selection and degrading morphologically rich tracks. Our final system achieves an overall macro-averaged F1 score of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
