On the Copying Behaviors of Pre-Training for Neural Machine Translation
Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao,, Shuming Shi, Zhaopeng Tu

TL;DR
Pre-training in neural machine translation influences copying behaviors due to objective discrepancies, and applying a copying penalty during decoding improves translation quality across various benchmarks.
Contribution
This work identifies the impact of pre-training on copying behaviors in NMT and proposes a copying penalty method to mitigate this issue, enhancing translation performance.
Findings
Pre-trained NMT models have higher copying ratios.
Copying penalty improves translation accuracy.
Method is effective on multiple benchmarks.
Abstract
Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
