# Mask-Predict: Parallel Decoding of Conditional Masked Language Models

**Authors:** Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer

arXiv: 1904.09324 · 2019-09-05

## TL;DR

This paper introduces Mask-Predict, a parallel decoding method for masked language models that improves translation quality and speed by iteratively predicting and refining target words, bridging the gap with autoregressive models.

## Contribution

The paper presents a novel parallel decoding algorithm for masked language models that enhances translation performance and efficiency over previous non-autoregressive methods.

## Key findings

- Achieves over 4 BLEU improvement on average for translation tasks.
- Reaches within 1 BLEU point of traditional autoregressive models.
- Decodes significantly faster than autoregressive counterparts.

## Abstract

Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively, and then repeatedly mask out and regenerate the subset of words that the model is least confident about. By applying this strategy for a constant number of iterations, our model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average. It is also able to reach within about 1 BLEU point of a typical left-to-right transformer model, while decoding significantly faster.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.09324/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1904.09324/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/1904.09324/full.md

---
Source: https://tomesphere.com/paper/1904.09324