# Improving Zero-shot Translation with Language-Independent Constraints

**Authors:** Ngoc-Quan Pham, Jan Niehues, Thanh-Le Ha, Alex Waibel

arXiv: 1906.08584 · 2019-06-21

## TL;DR

This paper enhances zero-shot translation in multilingual NMT by designing language-independent encoder architectures and regularization methods, resulting in significant BLEU score improvements on the IWSLT 2017 dataset.

## Contribution

It introduces a novel encoder architecture and regularization techniques that improve zero-shot translation performance in multilingual NMT models.

## Key findings

- Achieved an average of 2.23 BLEU point improvement across 12 language pairs.
- Demonstrated robustness of the approach even with multiple pivot languages.
- Provided insights into how multilingual representations are learned in NMT models.

## Abstract

An important concern in training multilingual neural machine translation (NMT) is to translate between language pairs unseen during training, i.e zero-shot translation. Improving this ability kills two birds with one stone by providing an alternative to pivot translation which also allows us to better understand how the model captures information between languages.   In this work, we carried out an investigation on this capability of the multilingual NMT models. First, we intentionally create an encoder architecture which is independent with respect to the source language. Such experiments shed light on the ability of NMT encoders to learn multilingual representations, in general. Based on such proof of concept, we were able to design regularization methods into the standard Transformer model, so that the whole architecture becomes more robust in zero-shot conditions. We investigated the behaviour of such models on the standard IWSLT 2017 multilingual dataset. We achieved an average improvement of 2.23 BLEU points across 12 language pairs compared to the zero-shot performance of a state-of-the-art multilingual system. Additionally, we carry out further experiments in which the effect is confirmed even for language pairs with multiple intermediate pivots.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.08584/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1906.08584/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1906.08584/full.md

---
Source: https://tomesphere.com/paper/1906.08584