Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations
Ekaterina Taktasheva, Vladislav Mikhailov, Ekaterina Artemova

TL;DR
This paper investigates how multilingual Transformer models encode syntactic structure by analyzing their sensitivity to controlled text perturbations across different languages and layers, revealing limited use of positional information.
Contribution
It introduces nine new probing datasets for three languages and demonstrates that syntactic sensitivity varies with language, model objectives, and layer depth, challenging assumptions about positional encoding.
Findings
Syntactic sensitivity varies by language and pre-training objectives.
Sensitivity increases with perturbation granularity across layers.
Models minimally utilize positional information for syntactic tree induction.
Abstract
Recent research has adopted a new experimental field centered around the concept of text perturbations which has revealed that shuffled word order has little to no impact on the downstream performance of Transformer-based language models across many NLP tasks. These findings contradict the common understanding of how the models encode hierarchical and structural information and even question if the word order is modeled with position embeddings. To this end, this paper proposes nine probing datasets organized by the type of \emph{controllable} text perturbation for three Indo-European languages with a varying degree of word order flexibility: English, Swedish and Russian. Based on the probing analysis of the M-BERT and M-BART models, we report that the syntactic sensitivity depends on the language and model pre-training objectives. We also find that the sensitivity grows across layers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
