On the Copying Behaviors of Pre-Training for Neural Machine Translation

Xuebo Liu; Longyue Wang; Derek F. Wong; Liang Ding; Lidia S. Chao,; Shuming Shi; Zhaopeng Tu

arXiv:2107.08212·cs.CL·July 20, 2021·1 cites

On the Copying Behaviors of Pre-Training for Neural Machine Translation

Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao,, Shuming Shi, Zhaopeng Tu

PDF

Open Access 1 Repo

TL;DR

Pre-training in neural machine translation influences copying behaviors due to objective discrepancies, and applying a copying penalty during decoding improves translation quality across various benchmarks.

Contribution

This work identifies the impact of pre-training on copying behaviors in NMT and proposes a copying penalty method to mitigate this issue, enhancing translation performance.

Findings

01

Pre-trained NMT models have higher copying ratios.

02

Copying penalty improves translation accuracy.

03

Method is effective on multiple benchmarks.

Abstract

Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SunbowLiu/CopyingPenalty
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications