Why Better Cross-Lingual Alignment Fails for Better Cross-Lingual Transfer: Case of Encoders
Yana Veitsman, Yihong Liu, Hinrich Sch\"utze

TL;DR
This paper investigates why improved cross-lingual embedding alignment does not necessarily lead to better transfer performance, revealing the orthogonality of alignment and task objectives and providing practical fine-tuning guidelines.
Contribution
It demonstrates the disconnect between alignment quality and transfer success, analyzing gradient relationships and offering strategies for effective cross-lingual model fine-tuning.
Findings
Embedding distances are unreliable predictors of transfer performance.
Alignment and task gradients are often orthogonal, limiting transfer benefits.
Careful loss selection is crucial for effective cross-lingual transfer.
Abstract
Better cross-lingual alignment is often assumed to yield better cross-lingual transfer. However, explicit alignment techniques -- despite increasing embedding similarity -- frequently fail to improve token-level downstream performance. In this work, we show that this mismatch arises because alignment and downstream task objectives are largely orthogonal, and because the downstream benefits from alignment vary substantially across languages and task types. We analyze four XLM-R encoder models aligned on different language pairs and fine-tuned for either POS Tagging or Sentence Classification. Using representational analyses, including embedding distances, gradient similarities, and gradient magnitudes for both task and alignment losses, we find that: (1) embedding distances alone are unreliable predictors of improvements (or degradations) in task performance and (2) alignment and task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Artificial Intelligence in Healthcare and Education · Topic Modeling
