Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes, Behrooz Ghorbani, Xavier Garcia, Markus Freitag,, Orhan Firat

TL;DR
This paper presents a comprehensive empirical study of how scaling model size and training mixture composition influence the performance of multilingual neural machine translation models, revealing key insights into effective parameter allocation and language direction effects.
Contribution
It introduces a novel joint scaling law formulation for multilingual models, showing that language mixture weights mainly affect the scaling factor, and highlights the impact of language direction over language similarity.
Findings
Scaling laws are consistent across different language mixture weights.
Language similarity has little impact on scaling behavior.
Models translating into English have more effective parameters per task.
Abstract
In this work, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models. We examine how increases in the model size affect the model performance and investigate the role of the training mixture composition on the scaling behavior. We find that changing the weightings of the individual language pairs in the training mixture only affect the multiplicative factor of the scaling law. In particular, we observe that multilingual models trained using different mixing rates all exhibit the same scaling exponent. Through a novel joint scaling law formulation, we compute the effective number of parameters allocated to each language pair and examine the role of language similarity in the scaling behavior of our models. We find little evidence that language similarity has any impact. In contrast, the direction of the multilinguality plays a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsTest
