Budget-Xfer: Budget-Constrained Source Language Selection for Cross-Lingual Transfer to African Languages
Tewodros Kederalah Idris, Roald Eiselen, Prasenjit Mitra

TL;DR
Budget-Xfer introduces a framework for optimizing source language selection and data allocation under fixed budgets in cross-lingual transfer learning, improving NLP for African low-resource languages.
Contribution
It formulates source language selection as a budget-constrained resource allocation problem and evaluates strategies across multiple NLP tasks and languages.
Findings
Multi-source transfer outperforms single-source transfer.
Differences among multi-source strategies are modest and often non-significant.
Embedding similarity as a selection proxy is task-dependent, sometimes less effective than random selection.
Abstract
Cross-lingual transfer learning enables NLP for low-resource languages by leveraging labeled data from higher-resource sources, yet existing comparisons of source language selection strategies do not control for total training data, confounding language selection effects with data quantity effects. We introduce Budget-Xfer, a framework that formulates multi-source cross-lingual transfer as a budget-constrained resource allocation problem. Given a fixed annotation budget B, our framework jointly optimizes which source languages to include and how much data to allocate from each. We evaluate four allocation strategies across named entity recognition and sentiment analysis for three African target languages (Hausa, Yoruba, Swahili) using two multilingual models, conducting 288 experiments. Our results show that (1) multi-source transfer significantly outperforms single-source transfer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
