Assessing the Impact of Typological Features on Multilingual Machine Translation in the Age of Large Language Models
Vitalii Hirak, Jaap Jumelet, Arianna Bisazza

TL;DR
This study investigates how typological features of languages influence the performance of large multilingual translation models, revealing that certain typological traits significantly affect translation quality and decoding strategies.
Contribution
It provides a comprehensive analysis of typological effects on large pre-trained models and introduces a new dataset of typological properties for 212 languages.
Findings
Typological features significantly impact translation quality.
Languages with specific properties benefit from alternative decoding strategies.
Typological properties can predict translation performance beyond resource availability.
Abstract
Despite major advances in multilingual modeling, large quality disparities persist across languages. Besides the obvious impact of uneven training resources, typological properties have also been proposed to determine the intrinsic difficulty of modeling a language. The existing evidence, however, is mostly based on small monolingual language models or bilingual translation models trained from scratch. We expand on this line of work by analyzing two large pre-trained multilingual translation models, NLLB-200 and Tower+, which are state-of-the-art representatives of encoder-decoder and decoder-only machine translation, respectively. Based on a broad set of languages, we find that target language typology drives translation quality of both models, even after controlling for more trivial factors, such as data resourcedness and writing script. Additionally, languages with certain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
