Multilingual Large Language Models and Curse of Multilinguality
Daniil Gurgurov, Tanja B\"aumel, Tatiana Anikina

TL;DR
This paper provides an overview of multilingual Large Language Models, their architectures, training methods, and the challenge known as the curse of multilinguality, along with current strategies to address this limitation.
Contribution
It offers a comprehensive survey of multilingual LLMs, detailing their technical aspects and discussing approaches to mitigate the curse of multilinguality.
Findings
Different model architectures have unique strengths and limitations.
The curse of multilinguality limits performance as more languages are added.
Current methods show promise in overcoming this challenge.
Abstract
Multilingual Large Language Models (LLMs) have gained large popularity among Natural Language Processing (NLP) researchers and practitioners. These models, trained on huge datasets, show proficiency across various languages and demonstrate effectiveness in numerous downstream tasks. This paper navigates the landscape of multilingual LLMs, providing an introductory overview of their technical aspects. It explains underlying architectures, objective functions, pre-training data sources, and tokenization methods. This work explores the unique features of different model types: encoder-only (mBERT, XLM-R), decoder-only (XGLM, PALM, BLOOM, GPT-3), and encoder-decoder models (mT5, mBART). Additionally, it addresses one of the significant limitations of multilingual LLMs - the curse of multilinguality - and discusses current attempts to overcome it.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multilingual Education and Policy · Linguistics, Language Diversity, and Identity
MethodsBLOOM
