XITE: Cross-lingual Interpolation for Transfer using Embeddings

Barah Fazili; Preethi Jyothi

arXiv:2604.23589·cs.CL·April 28, 2026

XITE: Cross-lingual Interpolation for Transfer using Embeddings

Barah Fazili, Preethi Jyothi

PDF

TL;DR

XITE is an embedding-based data augmentation method that improves cross-lingual transfer in multilingual models by creating synthetic data through interpolation of source and target embeddings, boosting performance across diverse languages.

Contribution

The paper introduces XITE, a novel embedding interpolation technique that enhances cross-lingual transfer and maintains high-resource language performance in multilingual models.

Findings

01

Up to 35.91% improvement in sentiment analysis

02

Up to 81.16% improvement in natural language inference

03

Effective across languages like Korean, Arabic, Urdu, and Hindi

Abstract

Facilitating cross-lingual transfer in multilingual language models remains a critical challenge. Towards this goal, we propose an embedding-based data augmentation technique called XITE. We start with unlabeled text from a low-resource target language, identify an English counterpart in a task-specific training corpus using embedding-based similarities and adopt its label. Next, we perform a simple interpolation of the source and target embeddings to create synthetic data for task-specific fine-tuning. Projecting the target text into a language-rich subspace using linear discriminant analysis (LDA), prior to interpolation, further boosts performance. Our cross-lingual embedding-based augmentation technique XITE yields significant improvements of up to 35.91% for sentiment analysis and up to 81.16% for natural language inference, using XLM-R, for a diverse set of target languages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.