MvSR-NAT: Multi-view Subset Regularization for Non-Autoregressive Machine Translation
Pan Xie, Zexian Li, Xiaohui Hu

TL;DR
This paper introduces MvSR, a regularization technique for non-autoregressive machine translation that enhances model consistency through shared mask and model weight regularizations, leading to significant BLEU score improvements.
Contribution
The paper proposes Multi-view Subset Regularization (MvSR), a novel regularization method that improves NAT models without altering their architecture.
Findings
Achieves 0.36-1.14 BLEU improvements over previous NAT models.
Reduces the performance gap to the Transformer baseline to 0.01-0.44 BLEU on small datasets.
Demonstrates effectiveness across three public benchmarks.
Abstract
Conditional masked language models (CMLM) have shown impressive progress in non-autoregressive machine translation (NAT). They learn the conditional translation model by predicting the random masked subset in the target sentence. Based on the CMLM framework, we introduce Multi-view Subset Regularization (MvSR), a novel regularization method to improve the performance of the NAT model. Specifically, MvSR consists of two parts: (1) \textit{shared mask consistency}: we forward the same target with different mask strategies, and encourage the predictions of shared mask positions to be consistent with each other. (2) \textit{model consistency}, we maintain an exponential moving average of the model weights, and enforce the predictions to be consistent between the average model and the online model. Without changing the CMLM-based architecture, our approach achieves remarkable performance on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Label Smoothing · Softmax · Byte Pair Encoding
