Is Cross-Lingual Transfer in Bilingual Models Human-Like? A Study with Overlapping Word Forms in Dutch and English
Iza \v{S}krjanec, Irene Elisabeth Winther, Vera Demberg, Stefan L. Frank

TL;DR
This study investigates whether bilingual language models exhibit human-like cross-lingual activation patterns, finding that shared embeddings influence effects but may limit their explanatory power for bilingual reading.
Contribution
It demonstrates how embedding sharing conditions affect cross-lingual effects in bilingual models, highlighting the importance of lexical encoding for human-like processing.
Findings
Models show cross-lingual effects mainly when embeddings are shared.
Facilitation occurs for both cognates and false friends with shared embeddings.
Frequency influences cross-lingual effects more than form-meaning consistency.
Abstract
Bilingual speakers show cross-lingual activation during reading, especially for words with shared surface form. Cognates (friends) typically lead to facilitation, whereas interlingual homographs (false friends) cause interference or no effect. We examine whether cross-lingual activation in bilingual language models mirrors these patterns. We train Dutch-English causal Transformers under four vocabulary-sharing conditions that manipulate whether (false) friends receive shared or language-specific embeddings. Using psycholinguistic stimuli from bilingual reading studies, we evaluate the models through surprisal and embedding similarity analyses. The models largely maintain language separation, and cross-lingual effects arise primarily when embeddings are shared. In these cases, both friends and false friends show facilitation relative to controls. Regression analyses reveal that these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
