Similarity of Sentence Representations in Multilingual LMs: Resolving Conflicting Literature and Case Study of Baltic Languages
Maksym Del, Mark Fishel

TL;DR
This study investigates how multilingual language models represent Baltic languages and others, revealing that most languages share a common cross-lingual space, but some do not, and that Baltic languages do belong to this shared space under certain conditions.
Contribution
The paper clarifies conflicting prior results by showing that language representations do converge in shared space with different pooling strategies and similarity measures, and provides a detailed analysis of Baltic languages.
Findings
Most languages share a joint cross-lingual space in multilingual LMs.
Baltic languages do belong to the shared cross-lingual space under certain conditions.
Different pooling strategies and similarity indices affect the observed language convergence.
Abstract
Low-resource languages, such as Baltic languages, benefit from Large Multilingual Models (LMs) that possess remarkable cross-lingual transfer performance capabilities. This work is an interpretation and analysis study into cross-lingual representations of Multilingual LMs. Previous works hypothesized that these LMs internally project representations of different languages into a shared cross-lingual space. However, the literature produced contradictory results. In this paper, we revisit the prior work claiming that "BERT is not an Interlingua" and show that different languages do converge to a shared space in such language models with another choice of pooling strategy or similarity index. Then, we perform cross-lingual representational analysis for the two most popular multilingual LMs employing 378 pairwise language comparisons. We discover that while most languages share joint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
