Deep Architectures for Neural Machine Translation

Antonio Valerio Miceli Barone; Jind\v{r}ich Helcl; Rico Sennrich; and Barry Haddow; Alexandra Birch

arXiv:1707.07631·cs.CL·July 25, 2017

Deep Architectures for Neural Machine Translation

Antonio Valerio Miceli Barone, Jind\v{r}ich Helcl, Rico Sennrich, and Barry Haddow, Alexandra Birch

PDF

3 Repos

TL;DR

This paper systematically compares various deep neural architectures for machine translation, introduces novel variants including BiDeep RNNs, and demonstrates improved translation quality and speed on English-German translation tasks.

Contribution

It provides a comprehensive evaluation of existing deep architectures, proposes the BiDeep RNN as a new approach, and shows its effectiveness in neural machine translation.

Findings

01

BiDeep RNN achieves 1.5 BLEU improvement over shallow baselines.

02

Several architectures improve translation quality and speed.

03

Deep models outperform shallower counterparts in translation tasks.

Abstract

It has been shown that increasing model depth improves the quality of neural machine translation. However, different architectural variants to increase model depth have been proposed, and so far, there has been no thorough comparative study. In this work, we describe and evaluate several existing approaches to introduce depth in neural machine translation. Additionally, we explore novel architectural variants, including deep transition RNNs, and we vary how attention is used in the deep decoder. We introduce a novel "BiDeep" RNN architecture that combines deep transition RNNs and stacked RNNs. Our evaluation is carried out on the English to German WMT news translation dataset, using a single-GPU machine for both training and inference. We find that several of our proposed architectures improve upon existing approaches in terms of speed and translation quality. We obtain best…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings