# Learning Joint Multilingual Sentence Representations with Neural Machine   Translation

**Authors:** Holger Schwenk, Matthijs Douze

arXiv: 1704.04154 · 2017-08-09

## TL;DR

This paper leverages neural machine translation to learn language-independent sentence embeddings across six languages, demonstrating that semantically related sentences are close in embedding space despite structural differences.

## Contribution

It introduces a new cross-lingual similarity measure and provides extensive analysis of sentence relations across languages using neural translation models.

## Key findings

- Close sentences in embedding space are semantically related.
- Semantic similarity holds across different languages.
- Sentence structure varies despite semantic closeness.

## Abstract

In this paper, we use the framework of neural machine translation to learn joint sentence representations across six very different languages. Our aim is that a representation which is independent of the language, is likely to capture the underlying semantics. We define a new cross-lingual similarity measure, compare up to 1.4M sentence representations and study the characteristics of close sentences. We provide experimental evidence that sentences that are close in embedding space are indeed semantically highly related, but often have quite different structure and syntax. These relations also hold when comparing sentences in different languages.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.04154/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1704.04154/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1704.04154/full.md

---
Source: https://tomesphere.com/paper/1704.04154