Non-Autoregressive Machine Translation with Disentangled Context   Transformer

Jungo Kasai; James Cross; Marjan Ghazvininejad; Jiatao Gu

arXiv:2001.05136·cs.CL·July 1, 2020·51 cites

Non-Autoregressive Machine Translation with Disentangled Context Transformer

Jungo Kasai, James Cross, Marjan Ghazvininejad, Jiatao Gu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces the DisCo transformer, a non-autoregressive machine translation model that generates all tokens simultaneously using disentangled contexts, leading to faster inference with competitive translation quality.

Contribution

The paper proposes a novel attention-masking model and inference algorithm for non-autoregressive translation, enabling parallel token generation and improved decoding speed.

Findings

01

Achieves comparable or better translation quality than autoregressive models.

02

Significantly reduces decoding time across multiple translation tasks.

03

Demonstrates effectiveness on 7 translation directions with various data sizes.

Abstract

State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation process causes fundamental latency in inference since we cannot generate multiple tokens in each sentence in parallel. We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts. The DisCo transformer is trained to predict every output token given an arbitrary subset of the other reference tokens. We also develop the parallel easy-first inference algorithm, which iteratively refines every token in parallel and reduces the number of required iterations. Our extensive experiments on 7 translation directions with varying data sizes demonstrate that our model achieves competitive, if not better,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/DisCo
noneOfficial

Videos

Non-autoregressive Machine Translation with Disentangled Context Transformer· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax