Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation
Oleg Serikov, Vitaly Protasov, Ekaterina Voloshina, Viktoria, Knyazkova, Tatiana Shavrina

TL;DR
This paper introduces a multilingual probing framework that systematically evaluates language models across 104 languages and 80 morphosyntactic features, revealing Western European language biases and enabling standardized interpretation.
Contribution
The authors present a GUI-assisted, extensible toolkit for comprehensive, reproducible probing of multilingual models across diverse languages and features, filling a gap in typological diversity analysis.
Findings
Most regularities in mBERT are Western European-centric.
The framework covers 104 languages and 80 features.
It facilitates reproducible, standardized multilingual model evaluation.
Abstract
Linguistic analysis of language models is one of the ways to explain and describe their reasoning, weaknesses, and limitations. In the probing part of the model interpretability research, studies concern individual languages as well as individual linguistic structures. The question arises: are the detected regularities linguistically coherent, or on the contrary, do they dissonate at the typological scale? Moreover, the majority of studies address the inherent set of languages and linguistic structures, leaving the actual typological diversity knowledge out of scope. In this paper, we present and apply the GUI-assisted framework allowing us to easily probe a massive number of languages for all the morphosyntactic features present in the Universal Dependencies data. We show that reflecting the anglo-centric trend in NLP over the past years, most of the regularities revealed in the mBERT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsMassively multilingual probing based on Universal Dependencies · mBERT
