A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives
Zihao Li, Shaoxiong Ji, Timothee Mickus, Vincent Segonne, J\"org, Tiedemann

TL;DR
This paper systematically compares multilingual language modeling and translation objectives under controlled conditions, revealing that architecture influences optimal pretraining methods and that translation is highly effective for multilingual pretraining.
Contribution
It provides a controlled comparison of pretraining objectives, highlighting the impact of architecture and demonstrating the effectiveness of translation-based pretraining.
Findings
Architecture determines the best pretraining objective.
Multilingual translation is highly effective for pretraining.
Controlled experiments enable fair comparison across methods.
Abstract
Pretrained language models (PLMs) display impressive performances and have captured the attention of the NLP community. Establishing best practices in pretraining has, therefore, become a major focus of NLP research, especially since insights gained from monolingual English models may not necessarily apply to more complex multilingual models. One significant caveat of the current state of the art is that different works are rarely comparable: they often discuss different parameter counts, training data, and evaluation methodology. This paper proposes a comparison of multilingual pretraining objectives in a controlled methodological environment. We ensure that training data and model architectures are comparable, and discuss the downstream performances across 6 languages that we observe in probing and fine-tuning scenarios. We make two key observations: (1) the architecture dictates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need · Focus
