Long-Short Term Masking Transformer: A Simple but Effective Baseline for   Document-level Neural Machine Translation

Pei Zhang; Boxing Chen; Niyu Ge; Kai Fan

arXiv:2009.09127·cs.CL·September 22, 2020·1 cites

Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation

Pei Zhang, Boxing Chen, Niyu Ge, Kai Fan

PDF

Open Access

TL;DR

This paper introduces a simple long-short term masking self-attention mechanism for document-level neural machine translation, improving long-range dependency modeling and reducing error propagation with strong BLEU scores.

Contribution

It proposes a novel masking strategy on the standard transformer to enhance document-level translation without increasing model complexity.

Findings

01

Achieves strong BLEU scores on two datasets

02

Effectively captures discourse phenomena

03

Reduces error propagation in translation

Abstract

Many document-level neural machine translation (NMT) systems have explored the utility of context-aware architecture, usually requiring an increasing number of parameters and computational complexity. However, few attention is paid to the baseline model. In this paper, we research extensively the pros and cons of the standard transformer in document-level translation, and find that the auto-regressive property can simultaneously bring both the advantage of the consistency and the disadvantage of error accumulation. Therefore, we propose a surprisingly simple long-short term masking self-attention on top of the standard transformer to both effectively capture the long-range dependence and reduce the propagation of errors. We examine our approach on the two publicly available document-level datasets. We can achieve a strong result in BLEU and capture discourse phenomena.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications