The Best of Both Worlds: Combining Recent Advances in Neural Machine   Translation

Mia Xu Chen; Orhan Firat; Ankur Bapna; Melvin Johnson; Wolfgang; Macherey; George Foster; Llion Jones; Niki Parmar; Mike Schuster; Zhifeng; Chen; Yonghui Wu; Macduff Hughes

arXiv:1804.09849·cs.CL·April 30, 2018

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang, Macherey, George Foster, Llion Jones, Niki Parmar, Mike Schuster, Zhifeng, Chen, Yonghui Wu, Macduff Hughes

PDF

3 Repos

TL;DR

This paper analyzes recent advances in neural machine translation architectures, isolates key techniques, and develops hybrid models that outperform existing state-of-the-art methods on benchmark datasets.

Contribution

It introduces the RNMT+ model applying key techniques to RNNs and proposes hybrid architectures combining strengths of different models, achieving superior performance.

Findings

01

RNMT+ outperforms RNN, CNN, and Transformer models on WMT'14 benchmarks.

02

Hybrid architectures surpass RNMT+ in translation quality.

03

Key modeling and training techniques are transferable across architectures.

Abstract

The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures. In this paper, we tease apart the new architectures and their accompanying techniques in two ways. First, we identify several key modeling and training techniques, and apply them to the RNN architecture, yielding a new RNMT+ model that outperforms all of the three fundamental architectures on the benchmark WMT'14 English to French and English to German tasks. Second, we analyze the properties of each fundamental seq2seq…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Sigmoid Activation · Tanh Activation · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia?