Aligned Cross Entropy for Non-Autoregressive Machine Translation

Marjan Ghazvininejad; Vladimir Karpukhin; Luke Zettlemoyer; Omer Levy

arXiv:2004.01655·cs.CL·April 6, 2020·68 cites

Aligned Cross Entropy for Non-Autoregressive Machine Translation

Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces aligned cross entropy (AXE), a new loss function for non-autoregressive machine translation models that improves alignment and performance by using a differentiable dynamic programming approach.

Contribution

The paper proposes AXE, an innovative loss function that enhances training of non-autoregressive models by better handling word order alignment, leading to state-of-the-art results.

Findings

01

AXE improves translation quality on WMT benchmarks.

02

AXE achieves new state-of-the-art performance for non-autoregressive models.

03

AXE effectively models word order without autoregressive factors.

Abstract

Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propose aligned cross entropy (AXE) as an alternative loss function for training of non-autoregressive models. AXE uses a differentiable dynamic program to assign loss based on the best possible monotonic alignment between target tokens and model predictions. AXE-based training of conditional masked language models (CMLMs) substantially improves performance on major WMT benchmarks, while setting a new state of the art for non-autoregressive models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

m3yrin/aligned-cross-entropy
pytorch

Videos

Aligned Cross Entropy for Non-Autoregressive Machine Translation· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings