Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine   Translation

Jungo Kasai; Nikolaos Pappas; Hao Peng; James Cross; Noah A. Smith

arXiv:2006.10369·cs.CL·June 28, 2021·31 cites

Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah A. Smith

PDF

Open Access 2 Repos 6 Models 1 Video

TL;DR

This paper demonstrates that with a sufficiently deep encoder, simple autoregressive models can outperform non-autoregressive models in translation quality without sacrificing speed, challenging previous assumptions.

Contribution

It shows that optimizing encoder depth and proper evaluation can significantly improve autoregressive models' speed and accuracy, redefining the non-autoregressive vs. autoregressive tradeoff.

Findings

01

Deep encoders enable shallow decoders to outperform non-autoregressive models.

02

Previous speed disadvantages of autoregressive models were overestimated.

03

Proper evaluation protocols reveal autoregressive models can be faster and more accurate.

Abstract

Much recent effort has been invested in non-autoregressive neural machine translation, which appears to be an efficient alternative to state-of-the-art autoregressive machine translation on modern GPUs. In contrast to the latter, where generation is sequential, the former allows generation to be parallelized across target token positions. Some of the latest non-autoregressive models have achieved impressive translation quality-speed tradeoffs compared to autoregressive baselines. In this work, we reexamine this tradeoff and argue that autoregressive baselines can be substantially sped up without loss in accuracy. Specifically, we study autoregressive models with encoders and decoders of varied depths. Our extensive experiments show that given a sufficiently deep encoder, a single-layer autoregressive decoder can substantially outperform strong non-autoregressive models with comparable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings