Understanding Cross-Lingual Alignment -- A Survey
Katharina H\"ammerl, Jind\v{r}ich Libovick\'y, Alexander Fraser

TL;DR
This survey comprehensively reviews techniques for cross-lingual alignment in multilingual models, categorizing methods, summarizing key insights, and discussing applications across different model architectures.
Contribution
It provides a taxonomy of cross-lingual alignment methods, summarizes extensive research findings, and discusses future directions for various model types.
Findings
Effective trade-off between language-neutral and language-specific info is crucial.
Various techniques have been developed with differing strengths and limitations.
Insights can be extended beyond encoder models to other architectures.
Abstract
Cross-lingual alignment, the meaningful similarity of representations across languages in multilingual language models, has been an active field of research in recent years. We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field. We present different understandings of cross-lingual alignment and their limitations. We provide a qualitative summary of results from a large number of surveyed papers. Finally, we discuss how these insights may be applied not only to encoder models, where this topic has been heavily studied, but also to encoder-decoder or even decoder-only models, and argue that an effective trade-off between language-neutral and language-specific information is key.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicslinguistics and terminology studies
