# Probing Multilingual Sentence Representations With X-Probe

**Authors:** Vinit Ravishankar, Lilja {\O}vrelid, Erik Velldal

arXiv: 1906.05061 · 2019-06-13

## TL;DR

This paper introduces multilingual probing datasets and evaluates sentence encoders across five languages, revealing that cross-lingual mapping often preserves linguistic information better than English-trained encoders.

## Contribution

It provides new multilingual probing datasets and compares six encoders, highlighting the effectiveness of cross-lingual mapping for linguistic information retention.

## Key findings

- Cross-lingual mappings outperform English NLI encoders in linguistic tasks.
- Multilingual datasets derived from Wikipedia enable probing in five languages.
- Certain linguistic features are better preserved through cross-lingual mapping.

## Abstract

This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions: first, we provide datasets for multilingual probing, derived from Wikipedia, in five languages, viz. English, French, German, Spanish and Russian. Second, we evaluate six sentence encoders for each language, each trained by mapping sentence representations to English sentence representations, using sentences in a parallel corpus. We discover that cross-lingually mapped representations are often better at retaining certain linguistic information than representations derived from English encoders trained on natural language inference (NLI) as a downstream task.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.05061/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1906.05061/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/1906.05061/full.md

---
Source: https://tomesphere.com/paper/1906.05061