Understanding and Improving Sequence-to-Sequence Pretraining for Neural   Machine Translation

Wenxuan Wang; Wenxiang Jiao; Yongchang Hao; Xing Wang; Shuming Shi,; Zhaopeng Tu; Michael Lyu

arXiv:2203.08442·cs.CL·March 17, 2022·1 cites

Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

Wenxuan Wang, Wenxiang Jiao, Yongchang Hao, Xing Wang, Shuming Shi,, Zhaopeng Tu, Michael Lyu

PDF

Open Access

TL;DR

This paper investigates the effects of Seq2Seq pretraining in neural machine translation, revealing its benefits and limitations, and proposes strategies to enhance translation quality and robustness.

Contribution

It provides a detailed analysis of Seq2Seq pretraining impacts and introduces in-domain pretraining and input adaptation methods to address identified issues.

Findings

01

Seq2Seq pretraining improves translation diversity and reduces errors.

02

Discrepancies between pretraining and fine-tuning limit translation quality.

03

Proposed strategies enhance translation performance and robustness.

Abstract

In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation~(NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT. By carefully designing experiments on three language pairs, we find that Seq2Seq pretraining is a double-edged sword: On one hand, it helps NMT models to produce more diverse translations and reduce adequacy-related translation errors. On the other hand, the discrepancies between Seq2Seq pretraining and NMT finetuning limit the translation quality (i.e., domain discrepancy) and induce the over-estimation issue (i.e., objective discrepancy). Based on these observations, we further propose simple and effective strategies, named in-domain pretraining and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence