Inference Strategies for Machine Translation with Conditional Masking

Julia Kreutzer; George Foster; Colin Cherry

arXiv:2010.02352·cs.CL·October 21, 2020

Inference Strategies for Machine Translation with Conditional Masking

Julia Kreutzer, George Foster, Colin Cherry

PDF

TL;DR

This paper explores various inference strategies for conditional masked language models in machine translation, proposing a thresholding heuristic that improves over traditional mask-predict methods.

Contribution

It introduces a new inference heuristic based on thresholding, supported by analysis and experiments demonstrating its effectiveness.

Findings

01

Thresholding strategy outperforms mask-predict in translation quality.

02

Analyzes behavior of inference heuristics on machine translation tasks.

03

Shows that factorization of conditional probabilities does not harm performance.

Abstract

Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best inference strategy is. We formulate masked inference as a factorization of conditional probabilities of partial sequences, show that this does not harm performance, and investigate a number of simple heuristics motivated by this perspective. We identify a thresholding strategy that has advantages over the standard "mask-predict" algorithm, and provide analyses of its behavior on machine translation tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.