The Less the Merrier? Investigating Language Representation in Multilingual Models
Hellina Hailu Nigatu, Atnafu Lambebo Tonja, Jugal Kalita

TL;DR
This paper investigates how multilingual models represent different languages, especially low-resource ones, revealing community-centered models perform better for certain languages and offering insights to improve multilingual NLP.
Contribution
It provides an analysis of language support and representation in multilingual models, highlighting the benefits of community-centered approaches for low-resource languages.
Findings
Community-centered models outperform others for low-resource languages.
Models' representations vary across language families and dialects.
Performance on downstream tasks depends on language support and representation.
Abstract
Multilingual Language Models offer a way to incorporate multiple languages in one model and utilize cross-language transfer learning to improve performance for different Natural Language Processing (NLP) tasks. Despite progress in multilingual models, not all languages are supported as well, particularly in low-resource settings. In this work, we investigate the linguistic representation of different languages in multilingual models. We start by asking the question which languages are supported in popular multilingual models and which languages are left behind. Then, for included languages, we look at models' learned representations based on language family and dialect and try to understand how models' learned representations for~(1) seen and~(2) unseen languages vary across different language groups. In addition, we test and analyze performance on downstream tasks such as text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsFocus
