XITE: Cross-lingual Interpolation for Transfer using Embeddings
Barah Fazili, Preethi Jyothi

TL;DR
XITE is an embedding-based data augmentation method that improves cross-lingual transfer in multilingual models by creating synthetic data through interpolation of source and target embeddings, boosting performance across diverse languages.
Contribution
The paper introduces XITE, a novel embedding interpolation technique that enhances cross-lingual transfer and maintains high-resource language performance in multilingual models.
Findings
Up to 35.91% improvement in sentiment analysis
Up to 81.16% improvement in natural language inference
Effective across languages like Korean, Arabic, Urdu, and Hindi
Abstract
Facilitating cross-lingual transfer in multilingual language models remains a critical challenge. Towards this goal, we propose an embedding-based data augmentation technique called XITE. We start with unlabeled text from a low-resource target language, identify an English counterpart in a task-specific training corpus using embedding-based similarities and adopt its label. Next, we perform a simple interpolation of the source and target embeddings to create synthetic data for task-specific fine-tuning. Projecting the target text into a language-rich subspace using linear discriminant analysis (LDA), prior to interpolation, further boosts performance. Our cross-lingual embedding-based augmentation technique XITE yields significant improvements of up to 35.91% for sentiment analysis and up to 81.16% for natural language inference, using XLM-R, for a diverse set of target languages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
