Mutual Information and Diverse Decoding Improve Neural Machine   Translation

Jiwei Li; Dan Jurafsky

arXiv:1601.00372·cs.CL·March 24, 2016·99 cites

Mutual Information and Diverse Decoding Improve Neural Machine Translation

Jiwei Li, Dan Jurafsky

PDF

Open Access 1 Repo

TL;DR

This paper proposes a mutual information-based objective and a diverse decoding algorithm to enhance neural machine translation, leading to improved performance on German/English and French/English tasks.

Contribution

It introduces a mutual information objective and a diversity-promoting decoding method for neural MT, which outperform standard models.

Findings

01

Consistent performance improvements on WMT translation tasks.

02

Effective mutual information-based re-ranking method.

03

Enhanced diversity in translation outputs.

Abstract

Sequence-to-sequence neural translation models learn semantic and syntactic relations between sentence pairs by optimizing the likelihood of the target given the source, i.e., $p (y ∣ x)$ , an objective that ignores other potentially useful sources of information. We introduce an alternative objective function for neural MT that maximizes the mutual information between the source and target sentences, modeling the bi-directional dependency of sources and targets. We implement the model with a simple re-ranking method, and also introduce a decoding algorithm that increases diversity in the N-best list produced by the first pass. Applied to the WMT German/English and French/English tasks, the proposed models offers a consistent performance boost on both standard LSTM and attention-based neural MT architectures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hsgodhia/hred
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory