Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of   Multilingual Language Models

Terra Blevins; Hila Gonen; Luke Zettlemoyer

arXiv:2205.11758·cs.CL·October 25, 2022·1 cites

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

Terra Blevins, Hila Gonen, Luke Zettlemoyer

PDF

Open Access

TL;DR

This paper investigates the dynamics of multilingual pretraining in models like XLM-R, revealing how in-language and cross-lingual abilities develop at different stages and layers during training.

Contribution

It provides the first detailed analysis of pretraining dynamics across checkpoints, showing how linguistic skills and transfer abilities evolve over time.

Findings

01

High in-language performance emerges early in pretraining.

02

Cross-lingual transferability varies across language pairs.

03

Final layer performance degrades over time, with linguistic knowledge moving to lower layers.

Abstract

The emergent cross-lingual transfer seen in multilingual pretrained models has sparked significant interest in studying their behavior. However, because these analyses have focused on fully trained multilingual models, little is known about the dynamics of the multilingual pretraining process. We investigate when these models acquire their in-language and cross-lingual abilities by probing checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones. In contrast, the point in pretraining when the model learns to transfer cross-lingually differs across language pairs. Interestingly, we also observe that, across many languages and tasks, the final model layer exhibits significant performance degradation over time,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning

MethodsXLM-R