On the ability of monolingual models to learn language-agnostic representations
Leandro Rodrigues de Souza, Rodrigo Nogueira, Roberto Lotufo

TL;DR
This paper demonstrates that monolingual pretrained models can learn language-agnostic representations, performing well across different languages even when trained on a single language, challenging the belief that multilingual training is necessary for cross-lingual transfer.
Contribution
It provides evidence that monolingual models can achieve language-agnostic representations without multilingual pretraining, showing comparable performance across languages.
Findings
Monolingual models perform well on cross-lingual tasks.
Models pretrained on distant languages like German and Portuguese perform similarly on English tasks.
Language-agnostic representations can be learned from single-language pretraining.
Abstract
Pretrained multilingual models have become a de facto default approach for zero-shot cross-lingual transfer. Previous work has shown that these models are able to achieve cross-lingual representations when pretrained on two or more languages with shared parameters. In this work, we provide evidence that a model can achieve language-agnostic representations even when pretrained on a single language. That is, we find that monolingual models pretrained and finetuned on different languages achieve competitive performance compared to the ones that use the same target language. Surprisingly, the models show a similar performance on a same task regardless of the pretraining language. For example, models pretrained on distant languages such as German and Portuguese perform similarly on English tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
