Amortized Noisy Channel Neural Machine Translation
Richard Yuanzhe Pang, He He, Kyunghyun Cho

TL;DR
This paper explores creating an efficient amortized noisy channel neural machine translation model that achieves similar translation quality to traditional methods but with significantly faster inference, using three different training approaches.
Contribution
It introduces three novel training methods for amortized noisy channel NMT that improve inference speed while maintaining translation quality.
Findings
All approaches significantly speed up inference by 10-100 times.
Translation quality, measured by BLEU and BLEURT, remains comparable to BSR.
Reward metrics during decoding do not match BSR, indicating a trade-off.
Abstract
Noisy channel models have been especially effective in neural machine translation (NMT). However, recent approaches like "beam search and rerank" (BSR) incur significant computation overhead during inference, making real-world application infeasible. We aim to study if it is possible to build an amortized noisy channel NMT model such that when we do greedy decoding during inference, the translation accuracy matches that of BSR in terms of reward (based on the source-to-target log probability and the target-to-source log probability) and quality (based on BLEU and BLEURT). We attempt three approaches to train the new model: knowledge distillation, one-step-deviation imitation learning, and Q learning. The first approach obtains the noisy channel signal from a pseudo-corpus, and the latter two approaches aim to optimize toward a noisy-channel MT reward directly. For all three approaches,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
