Multilingual Transformer Encoders: a Word-Level Task-Agnostic Evaluation
F\'elix Gaschi, Fran\c{c}ois Plesse, Parisa Rastin, Yannick, Toussaint

TL;DR
This paper introduces a word-level, task-agnostic evaluation method for multilingual Transformer models, revealing that certain inner layers outperform explicitly aligned representations in cross-lingual tasks.
Contribution
It proposes a novel, more accurate evaluation method for assessing multilingual alignment in Transformer models, demonstrating the effectiveness of specific inner layers.
Findings
Inner layers of multilingual Transformers outperform explicit alignment methods.
The proposed method provides more accurate translated word pairs.
Some layers show better cross-lingual transfer than others.
Abstract
Some Transformer-based models can perform cross-lingual transfer learning: those models can be trained on a specific task in one language and give relatively good results on the same task in another language, despite having been pre-trained on monolingual tasks only. But, there is no consensus yet on whether those transformer-based models learn universal patterns across languages. We propose a word-level task-agnostic method to evaluate the alignment of contextualized representations built by such models. We show that our method provides more accurate translated word pairs than previous methods to evaluate word-level alignment. And our results show that some inner layers of multilingual Transformer-based models outperform other explicitly aligned representations, and even more so according to a stricter definition of multilingual alignment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
