# An Empirical Analysis of NMT-Derived Interlingual Embeddings and their   Use in Parallel Sentence Identification

**Authors:** Cristina Espa\~na-Bonet, \'Ad\'am Csaba Varga, Alberto, Barr\'on-Cede\~no, Josef van Genabith

arXiv: 1704.05415 · 2017-11-16

## TL;DR

This paper investigates the use of neural machine translation encoder outputs as interlingual sentence representations, demonstrating their effectiveness in identifying parallel sentences with high accuracy.

## Contribution

It systematically evaluates NMT encoder context vectors as semantic interlingua and applies them to parallel sentence identification with state-of-the-art results.

## Key findings

- NMT context vectors effectively represent sentence semantics across languages.
- Achieved F1 score of 98.2% in parallel sentence identification.
- Combining context vectors with similarity measures improves F1 to 98.9%.

## Abstract

End-to-end neural machine translation has overtaken statistical machine translation in terms of translation quality for some language pairs, specially those with large amounts of parallel data. Besides this palpable improvement, neural networks provide several new properties. A single system can be trained to translate between many languages at almost no additional cost other than training time. Furthermore, internal representations learned by the network serve as a new semantic representation of words -or sentences- which, unlike standard word embeddings, are learned in an essentially bilingual or even multilingual context. In view of these properties, the contribution of the present work is two-fold. First, we systematically study the NMT context vectors, i.e. output of the encoder, and their power as an interlingua representation of a sentence. We assess their quality and effectiveness by measuring similarities across translations, as well as semantically related and semantically unrelated sentence pairs. Second, as extrinsic evaluation of the first point, we identify parallel sentences in comparable corpora, obtaining an F1=98.2% on data from a shared task when using only NMT context vectors. Using context vectors jointly with similarity measures F1 reaches 98.9%.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.05415/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1704.05415/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/1704.05415/full.md

---
Source: https://tomesphere.com/paper/1704.05415