How does a Multilingual LM Handle Multiple Languages?
Santhosh Kakarla, Gautama Shastry Bulusu Venkata, Aishwarya Gaddam, Maheedhar Sai Omtri Mohan

TL;DR
This paper critically evaluates multilingual language models like BLOOM and Qwen2, highlighting their strengths in high-resource languages and limitations with low-resource languages, through semantic, syntactic, and cross-lingual tasks.
Contribution
It provides a comprehensive analysis of MLMs' capabilities and limitations across languages, proposing evaluation methods to improve multilingual NLP models.
Findings
Models perform well on high-resource languages.
Struggle with low-resource language understanding.
Cross-lingual transfer is limited for low-resource languages.
Abstract
Multilingual language models have significantly advanced due to rapid progress in natural language processing. Models like BLOOM 1.7B, trained on diverse multilingual datasets, aim to bridge linguistic gaps. However, their effectiveness in capturing linguistic knowledge, particularly for low-resource languages, remains an open question. This study critically examines MLMs capabilities in multilingual understanding, semantic representation, and cross-lingual knowledge transfer. While these models perform well for high-resource languages, they struggle with less-represented ones. Additionally, traditional evaluation methods often overlook their internal syntactic and semantic encoding. This research addresses key limitations through three objectives. First, it assesses semantic similarity by analyzing multilingual word embeddings for consistency using cosine similarity. Second, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Second Language Learning and Teaching · Linguistic research and analysis
MethodsBLOOM
