MvSR-NAT: Multi-view Subset Regularization for Non-Autoregressive   Machine Translation

Pan Xie; Zexian Li; Xiaohui Hu

arXiv:2108.08447·cs.CL·August 20, 2021

MvSR-NAT: Multi-view Subset Regularization for Non-Autoregressive Machine Translation

Pan Xie, Zexian Li, Xiaohui Hu

PDF

Open Access

TL;DR

This paper introduces MvSR, a regularization technique for non-autoregressive machine translation that enhances model consistency through shared mask and model weight regularizations, leading to significant BLEU score improvements.

Contribution

The paper proposes Multi-view Subset Regularization (MvSR), a novel regularization method that improves NAT models without altering their architecture.

Findings

01

Achieves 0.36-1.14 BLEU improvements over previous NAT models.

02

Reduces the performance gap to the Transformer baseline to 0.01-0.44 BLEU on small datasets.

03

Demonstrates effectiveness across three public benchmarks.

Abstract

Conditional masked language models (CMLM) have shown impressive progress in non-autoregressive machine translation (NAT). They learn the conditional translation model by predicting the random masked subset in the target sentence. Based on the CMLM framework, we introduce Multi-view Subset Regularization (MvSR), a novel regularization method to improve the performance of the NAT model. Specifically, MvSR consists of two parts: (1) \textit{shared mask consistency}: we forward the same target with different mask strategies, and encourage the predictions of shared mask positions to be consistent with each other. (2) \textit{model consistency}, we maintain an exponential moving average of the model weights, and enforce the predictions to be consistent between the average model and the online model. Without changing the CMLM-based architecture, our approach achieves remarkable performance on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Label Smoothing · Softmax · Byte Pair Encoding