Can Embedding Similarity Predict Cross-Lingual Transfer? A Systematic Study on African Languages
Tewodros Kederalah Idris, Prasenjit Mitra, Roald Eiselen

TL;DR
This study systematically evaluates embedding similarity metrics to predict cross-lingual transfer success for African languages, providing practical guidance for source language selection in low-resource NLP tasks.
Contribution
It offers a comprehensive analysis of five embedding similarity metrics across multiple African languages and models, revealing reliable predictors and emphasizing model-specific validation.
Findings
Cosine gap and retrieval metrics predict transfer success reliably.
CKA metric shows negligible predictive power.
Embedding metrics are comparable to linguistic typology in prediction.
Abstract
Cross-lingual transfer is essential for building NLP systems for low-resource African languages, but practitioners lack reliable methods for selecting source languages. We systematically evaluate five embedding similarity metrics across 816 transfer experiments spanning three NLP tasks, three African-centric multilingual models, and 12 languages from four language families. We find that cosine gap and retrieval-based metrics (P@1, CSLS) reliably predict transfer success (), while CKA shows negligible predictive power (). Critically, correlation signs reverse when pooling across models (Simpson's Paradox), so practitioners must validate per-model. Embedding metrics achieve comparable predictive power to URIEL linguistic typology. Our results provide concrete guidance for source language selection and highlight the importance of model-specific analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multilingual Education and Policy · Topic Modeling
