Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation
Chenze Shao, Jinchao Zhang, Yang Feng, Fandong Meng, Jie Zhou

TL;DR
This paper introduces a Bag-of-Ngrams difference training objective for Non-Autoregressive Neural Machine Translation, improving translation quality by better modeling target dependencies and outperforming baselines on multiple translation tasks.
Contribution
It proposes a differentiable Bag-of-Ngrams loss function for NAT, enhancing dependency modeling and translation quality over traditional cross-entropy training.
Findings
Outperforms baseline by about 5 BLEU on WMT14 En-De
Achieves about 2.5 BLEU improvement on WMT16 En-Ro
Effectively captures target-side sequential dependency
Abstract
Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through generating target words independently and simultaneously. However, in the context of non-autoregressive translation, the word-level cross-entropy loss cannot model the target-side sequential dependency properly, leading to its weak correlation with the translation quality. As a result, NAT tends to generate influent translations with over-translation and under-translation errors. In this paper, we propose to train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model output and the reference sentence. The bag-of-ngrams training objective is differentiable and can be efficiently calculated, which encourages NAT to capture the target-side sequential dependency and correlates well with the translation quality. We validate our approach on three translation tasks and show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
