Inference Strategies for Machine Translation with Conditional Masking
Julia Kreutzer, George Foster, Colin Cherry

TL;DR
This paper explores various inference strategies for conditional masked language models in machine translation, proposing a thresholding heuristic that improves over traditional mask-predict methods.
Contribution
It introduces a new inference heuristic based on thresholding, supported by analysis and experiments demonstrating its effectiveness.
Findings
Thresholding strategy outperforms mask-predict in translation quality.
Analyzes behavior of inference heuristics on machine translation tasks.
Shows that factorization of conditional probabilities does not harm performance.
Abstract
Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best inference strategy is. We formulate masked inference as a factorization of conditional probabilities of partial sequences, show that this does not harm performance, and investigate a number of simple heuristics motivated by this perspective. We identify a thresholding strategy that has advantages over the standard "mask-predict" algorithm, and provide analyses of its behavior on machine translation tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
