# Consistency by Agreement in Zero-shot Neural Machine Translation

**Authors:** Maruan Al-Shedivat, Ankur P. Parikh

arXiv: 1904.02338 · 2019-04-11

## TL;DR

This paper introduces a novel agreement-based training method for zero-shot neural machine translation, improving translation quality on unseen language pairs without sacrificing performance on known pairs.

## Contribution

It reformulates multilingual translation as probabilistic inference, defines zero-shot consistency, and proposes an agreement-based training approach to enhance zero-shot translation.

## Key findings

- Achieves 2-3 BLEU improvements in zero-shot translation
- Maintains performance on supervised translation directions
- Effective across multiple public benchmarks

## Abstract

Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization---a challenging setup that tests models on translation directions they have not been optimized for at training time. To solve the problem, we (i) reformulate multilingual translation as probabilistic inference, (ii) define the notion of zero-shot consistency and show why standard training often results in models unsuitable for zero-shot tasks, and (iii) introduce a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in auxiliary languages. We test our multilingual NMT models on multiple public zero-shot translation benchmarks (IWSLT17, UN corpus, Europarl) and show that agreement-based learning often results in 2-3 BLEU zero-shot improvement over strong baselines without any loss in performance on supervised translation directions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.02338/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1904.02338/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/1904.02338/full.md

---
Source: https://tomesphere.com/paper/1904.02338