Non-Autoregressive Translation with Layer-Wise Prediction and Deep   Supervision

Chenyang Huang; Hao Zhou; Osmar R. Za\"iane; Lili Mou; Lei Li

arXiv:2110.07515·cs.CL·October 15, 2021·23 cites

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Chenyang Huang, Hao Zhou, Osmar R. Za\"iane, Lili Mou, Lei Li

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces DSLP, a non-autoregressive translation model with deep supervision and layer-wise predictions, achieving high translation quality and efficiency, even surpassing autoregressive models on some tasks.

Contribution

The paper proposes a novel non-autoregressive Transformer with deep supervision and layer-wise predictions, significantly improving translation quality and inference speed.

Findings

01

Outperforms base models in BLEU scores across four translation tasks.

02

Achieves 14.8 times faster inference than autoregressive models.

03

Outperforms autoregressive models on three translation tasks.

Abstract

How do we perform efficient inference while retaining high translation quality? Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient. Recent non-autoregressive translation models speed up the inference, but their quality is still inferior. In this work, we propose DSLP, a highly efficient and high-performance model for machine translation. The key insight is to train a non-autoregressive Transformer with Deep Supervision and feed additional Layer-wise Predictions. We conducted extensive experiments on four translation tasks (both directions of WMT'14 EN-DE and WMT'16 EN-RO). Results show that our approach consistently improves the BLEU scores compared with respective base models. Specifically, our best variant outperforms the autoregressive model on three translation tasks, while being 14.8…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Absolute Position Encodings · Softmax · Residual Connection · Adam · Label Smoothing · Byte Pair Encoding