One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation
Chenze Shao, Xuanfu Wu, Yang Feng

TL;DR
This paper introduces DDRS, a method for non-autoregressive translation that uses multiple references and reference selection to better address the multi-modality problem, leading to improved translation quality.
Contribution
It proposes a novel diverse distillation approach with reference selection, generating multiple high-quality references and selecting the best match during training to enhance NAT performance.
Findings
Achieves 29.82 BLEU on WMT14 En-De with one decoding pass
Outperforms previous NAT models by over 1 BLEU
Demonstrates effectiveness of multiple references and reference selection
Abstract
Non-autoregressive neural machine translation (NAT) suffers from the multi-modality problem: the source sentence may have multiple correct translations, but the loss function is calculated only according to the reference sentence. Sequence-level knowledge distillation makes the target more deterministic by replacing the target with the output from an autoregressive model. However, the multi-modality problem in the distilled dataset is still nonnegligible. Furthermore, learning from a specific teacher limits the upper bound of the model capability, restricting the potential of NAT models. In this paper, we argue that one reference is not enough and propose diverse distillation with reference selection (DDRS) for NAT. Specifically, we first propose a method called SeedDiv for diverse machine translation, which enables us to generate a dataset containing multiple high-quality reference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
