Crosslingual Structural Priming and the Pre-Training Dynamics of Bilingual Language Models
Catherine Arnett, Tyler A. Chang, James A. Michaelov, Benjamin K., Bergen

TL;DR
This paper investigates how and when multilingual language models develop shared abstract grammatical representations across languages, using structural priming to analyze Dutch-English models during early pre-training stages.
Contribution
It extends structural priming analysis to a Dutch-English bilingual setting and examines the emergence of crosslingual grammatical representations during early pre-training.
Findings
Crosslingual priming effects appear after less than 1 million tokens in the second language.
Abstract grammatical representations develop early in multilingual models.
Implications for low-resource language transfer and data contamination are discussed.
Abstract
Do multilingual language models share abstract grammatical representations across languages, and if so, when do these develop? Following Sinclair et al. (2022), we use structural priming to test for abstract grammatical representations with causal effects on model outputs. We extend the approach to a Dutch-English bilingual setting, and we evaluate a Dutch-English language model during pre-training. We find that crosslingual structural priming effects emerge early after exposure to the second language, with less than 1M tokens of data in that language. We discuss implications for data contamination, low-resource transfer, and how abstract grammatical representations emerge in multilingual models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Neurobiology of Language and Bilingualism
