Exploring the Maze of Multilingual Modeling
Sina Bagheri Nezhad, Ameeta Agrawal

TL;DR
This paper evaluates three major multilingual language models across various languages and tasks, revealing key factors influencing their performance such as data resources, language family, and script type, to guide future improvements.
Contribution
It provides a comprehensive analysis of factors affecting multilingual model performance, highlighting the roles of data resources, language family, and script type, which were less understood before.
Findings
Model performance heavily depends on language-specific pretraining data.
Resource availability and linguistic features significantly impact model effectiveness.
Insights can guide future multilingual model development and optimization.
Abstract
Multilingual language models have gained significant attention in recent years, enabling the development of applications that meet diverse linguistic contexts. In this paper, we present a comprehensive evaluation of three popular multilingual language models: mBERT, XLM-R, and GPT-3. We assess their performance across a diverse set of languages, with a focus on understanding the impact of resource availability (general and model-specific), language family, script type, and word order on model performance, under two distinct tasks - text classification and text generation. Our findings reveal that while the amount of language-specific pretraining data plays a crucial role in model performance, we also identify other factors such as general resource availability, language family, and script type, as important features. We hope that our study contributes to a deeper understanding of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Layer Normalization · Dropout · Weight Decay · {Dispute@FaQ-s}How to file a dispute with Expedia? · Softmax · Byte Pair Encoding
