A Typologically Grounded Evaluation Framework for Word Order and Morphology Sensitivity in Multilingual Masked LMs
Anna Feldman, Libby Barak, Jing Peng

TL;DR
This paper presents a typology-aware diagnostic framework for evaluating multilingual masked language models' sensitivity to word order and morphology, revealing their reliance on surface forms across languages.
Contribution
It introduces a novel, inference-time perturbation-based evaluation method leveraging Universal Dependencies to analyze multilingual models' reliance on word order versus inflectional form.
Findings
Full scrambling reduces accuracy to near zero across languages.
Partial perturbations cause significant but smaller accuracy drops.
Lemmatization impacts accuracy differently depending on language.
Abstract
We introduce a typology-aware diagnostic for multilingual masked language models that tests reliance on word order versus inflectional form. Using Universal Dependencies, we apply inference-time perturbations: full token scrambling, content-word scrambling with function words fixed, dependency-based head--dependent swaps, and sentence-level lemma substitution (+L), which lemmatizes both the context and the masked target label. We evaluate mBERT and XLM-R on English, Chinese, German, Spanish, and Russian. Full scrambling drives word-level reconstruction accuracy near zero in all languages; partial and head--dependent perturbations cause smaller but still large drops. +L has little effect in Chinese but substantially lowers accuracy in German/Spanish/Russian, and it does not mitigate the impact of scrambling. Top-5 word accuracy shows the same pattern: under full scrambling, the gold word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Neurobiology of Language and Bilingualism
