Vicinal Risk Minimization for Few-Shot Cross-lingual Transfer in Abusive Language Detection
Gretel Liz De la Pe\~na Sarrac\'en, Paolo Rosso, Robert Litschko,, Goran Glava\v{s}, Simone Paolo Ponzetto

TL;DR
This paper introduces MIXAG, a novel data augmentation technique based on vicinal risk minimization, to improve few-shot cross-lingual abusive language detection across multiple languages and domains.
Contribution
The paper proposes MIXAG, a new data augmentation method that interpolates instances based on their representation angles, enhancing cross-lingual abusive language detection in low-resource settings.
Findings
MIXAG significantly improves detection performance across all target languages.
Data augmentation strategies enhance few-shot cross-lingual abusive language detection.
Domain adaptation reduces false negatives but may decrease precision.
Abstract
Cross-lingual transfer learning from high-resource to medium and low-resource languages has shown encouraging results. However, the scarcity of resources in target languages remains a challenge. In this work, we resort to data augmentation and continual pre-training for domain adaptation to improve cross-lingual abusive language detection. For data augmentation, we analyze two existing techniques based on vicinal risk minimization and propose MIXAG, a novel data augmentation method which interpolates pairs of instances based on the angle of their representations. Our experiments involve seven languages typologically distinct from English and three different domains. The results reveal that the data augmentation strategies can enhance few-shot cross-lingual abusive language detection. Specifically, we observe that consistently in all target languages, MIXAG improves significantly in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Interpreting and Communication in Healthcare · Natural Language Processing Techniques
