Rephrasing the Reference for Non-Autoregressive Machine Translation
Chenze Shao, Jinchao Zhang, Jie Zhou, Yang Feng

TL;DR
This paper introduces a rephraser to improve non-autoregressive machine translation by providing better training targets, addressing multi-modality issues, and achieving performance comparable to autoregressive models with higher efficiency.
Contribution
The paper proposes a novel rephraser mechanism to enhance NAT training by aligning reference sentences with NAT outputs, using reinforcement learning for optimization.
Findings
Consistent improvement in NAT translation quality on WMT benchmarks.
Achieves performance comparable to autoregressive Transformer models.
Increases inference efficiency by 14.7 times.
Abstract
Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem that there may exist multiple possible translations of a source sentence, so the reference sentence may be inappropriate for the training when the NAT output is closer to other translations. In response to this problem, we introduce a rephraser to provide a better training target for NAT by rephrasing the reference sentence according to the NAT output. As we train NAT based on the rephraser output rather than the reference sentence, the rephraser output should fit well with the NAT output and not deviate too far from the reference, which can be quantified as reward functions and optimized by reinforcement learning. Experiments on major WMT benchmarks and NAT baselines show that our approach consistently improves the translation quality of NAT. Specifically, our best variant achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Layer Normalization · Adam · Absolute Position Encodings · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Linear Layer
